GPU-Accelerated Mobile 3D Image Style Transfer

Photography has come leaps and bounds from its primitive monotonic days — emerging technologies are opening doors for dynamic 3D imaging and creative stylistic possibilities. Leia Inc. is a groundbreaking startup at the forefront of this innovation, as a leading provider of Lightfield hardware and content services. Recently, the team at Leia has been using Spell to augment their Lightfield computer vision research, this blog post is an overview of their recently published paper: GPU-Accelerated Mobile Multi-view Style Transfer.

3D photo of 5 men standing in a row with painting stylistic overlay

Background

With the rise of complex mobile phone capabilities, generating 3D images is now an app and a few finger taps away. Immersive and lifelike, these 3D photos are an exciting new way to make photography a more interactive experience. As various platforms such as Facebook 3D Photos and Holopix™ make it easier for consumers to capture, edit, and share these kinds of photos, amazing creative possibilities are coming to light. 

Traditional 2D photos on mobile devices have long enjoyed the creative freedom afforded by artistic styles, effects, and other processing capabilities. The ease by which these artistic effects can be applied is central to mobile photography’s appeal, and naturally, consumers will expect such stylistic enhancements to be available for 3D photos as well. 

However, creating 3D photos on a mobile device presents new technical challenges — information from multiple viewpoints must be synthesized, the algorithms require large amounts of processing power, and there are heavy storage and bandwidth requirements. The more viewpoints being used to generate the photo, the more realistic it looks, but the higher these requirements become. This presents a challenge for consumers accustomed to instant results on mobile devices, which often have limited computing power.

In recent years, artistic style transfers using neural networks have grown in popularity through programs such as Microsoft Pix and Prisma. Now, research is bringing this technique to the 3D photo space, allowing for greater artistic expression in addition to masking imperfections that may arise through the 3D image generation process. Despite these advancements, the process remains a challenge because transferring neural styling techniques from individual views to multi-view images inevitably result in inconsistently styling, leading to unpleasant jarring visual effects.

Some techniques have been developed to address these style transfer inconsistencies yet, these prior solutions are limited in both scope and efficiency. Often, they only consider two views and remain nontransferable to multiple views. Furthermore, they’re optimized for specific styles and need to be retrained for new styles. They also are not computationally efficient and would be difficult to run in real-time on mobile devices.

The authors of the research paper, “GPU-Accelerated Mobile Multi-view Style Transfer,” have developed an end-to-end pipeline for applying styles to multi-view images that eliminates previous consistency problems while maintaining sufficient performance for on-demand results in mobile platforms. Each component of the pipeline is configurable and modular so different algorithms can be substituted with ease. With GPU-acceleration, the styling can be performed at ~0.5 seconds.

Method

The Leia Inc. team’s method uses a style transfer neural network and styles only one of the input views before synthesizing. Their approach provides three key benefits: style consistency between views, performance advantages from stylizing just once (as opposed to each output view individually), and compatibility with existing style transfer networks (no retraining is needed). 

The pipeline starts out with left and right views and corresponding disparity maps. First, the left view is stylized using existing style transfer methods that operate on an individual image. Next, view synthesis is used to project the previously stylized left view to a stylized right view, at the same viewpoint as the input right view. 

The view synthesis algorithm operates on both CPU and GPU. The stylized views are then filtered to improve the definition of the edges in the original 3D objects, improving the viewer’s perception of the final image. In the final step of the process, the desired output views are synthesized from the filtered stylized left and right views, and associated disparity maps. This is done through repeated application of the same view synthesis algorithm used previously. 

The process for this pipeline was fine-tuned through several iterations before the team landed on a final approach capable of addressing all of the task’s challenges (such as ghosting artifacts and issues with depth effects).

Summary

The authors’ approach to producing stylized multi-view images has enabled the successful production of consistent stylized images with on-demand performance on GPU-accelerated mobile devices. Their unique pipeline scales linearly with the number of views generated, a departure from previous methods that increased greatly in runtime as the number of views increased. 

Furthermore, the pipeline performs faster than other methods and is significantly faster when run on a GPU. The pipeline processed up to 16 views on GPU in close to 0.5 seconds, while the baseline comparison approach took over 5 seconds. On a CPU, the new pipeline took about 2.5 seconds while the comparison approach took over 33 seconds.

Stylized images shared on the Holopix™ platform were well received - they enjoyed a 20% higher engagement rate than non-stylized images and over 300% higher engagement than the platform’s average engagement per image. Qualitatively, when viewing the results on a Lightfield display one can see that the new pipeline produces better images that are easier for the eye to focus on than previous methods.

Overall, this method shows promising results for quickly producing stylized images with eye-pleasing results and could be exactly what consumers are looking for, producing higher engagement on 3D photo sharing platforms. 

To read the full paper, visit https://arxiv.org/abs/2003.00706.

Get a personalized demo of Spell streamlined MLOps platform: https://spell.ml/demo

Ready to Get Started?

Create an account in minutes or connect with our team to learn how Spell can accelerate your business.