GPU-Accelerated Mobile 3D Image Style Transfer

Leia Inc. is the leading provider of Lightfield hardware and content services. They have been using Spell to assist with their research in advancing Lightfield computer vision. A recent work published by the team is GPU-Accelerated Mobile Multi-view Style Transfer. This post is an overview of their work with their detailed research paper found here:

With the emergence of mobile phones having multiple back-facing cameras, it has become possible to generate 3D images on the phone using a variety of applications. These 3D photos are an exciting new way to make photography more immersive and life-like. A popular app in this space, Holopix™, captures multiple viewpoints of a scene with enhanced depth and parallax effects. Platforms such as Facebook 3D Photos and Holopix™ also make it easy for consumers to capture, edit and share these kinds of photos. 

Traditional 2D photos on mobile devices have had many options for adding artistic styles, effects and other processing for a while now. These artistic effects are a key part of mobile photography’s appeal and consumers expect enhanced styles to be available for 3D photos as well. To create a 3D image, information from multiple viewpoints must be synthesized. Advances in recent algorithms like view synthesis and in-painting have made it possible to do this even on mobile devices.

However, creating 3D photos on a mobile device presents new technical challenges that need to be addressed - the algorithms require large amounts of processing power, storage and bandwidth requirements. The more viewpoints being used to generate the photo, the higher the requirements become. Mobile devices often have limited computing power to begin with, and consumers expect relatively quick results (~0.5 seconds). 

In recent years, artistic style transfers using neural networks have gained popular success in programs such as Microsoft Pix and Prisma. Now research has turned towards bringing this technique to the 3D photos space. An added benefit of applying these style transfers to multi-view images (a storage container for 3D Photos) is that artifacts or imperfections from the 3D generation process generally become unnoticeable once the effect is applied. Unfortunately, simply using the neural styling techniques on individual views that are input to a multi-view image usually results in inconsistently styled images. The inconsistent styling between views causes jarring visual effects which make it less pleasing for the viewer.

Although some techniques have been developed to address inconsistencies between views when applying artistic styles to individual views, these prior solutions are limited in scope and efficiency. They only consider two views and aren’t directly extendable to multiple views. Furthermore, they’re optimized for specific styles and need to be retrained for new styles. They also are not computationally efficient and would be difficult to run in real-time on mobile devices.

In recent work by the authors of the research paper “GPU-Accelerated Mobile Multi-view Style Transfer”, an end-to-end pipeline has been developed for applying styles to multi-view images without consistency problems, and with performance sufficient for on-demand results in mobile platforms. This pipeline addresses problems with previous approaches and is the first solution to efficiently apply styling without consistency issues between views. Each component of the pipeline is configurable and modular so that different algorithms can be substituted if desired. GPU-acceleration is used to deliver the necessary performance (~0.5 seconds). 


The method uses a style transfer neural network, and stylizes only one of the input views, before synthesizing. This provides three main benefits: style consistency between views, performance advantages from stylizing once (instead of each output view individually), and compatibility with existing style transfer networks (no retraining is needed). 

The pipeline starts out with left and right views and corresponding disparity maps. The left view is stylized using existing style transfer methods that operate on an individual image. Next, view synthesis is used to project the stylized left view to a stylized right view, at the same viewpoint as the input right view. 

The view synthesis algorithm operates on both CPU and GPU. The stylized views are then filtered to improve definition of the edges in the original 3D objects - this improves the viewer’s perception of the final image. 

In the final step of the process, the desired output views are synthesized from the filtered stylized left and right views, and associated disparity maps. This is done through repeated application of the same view synthesis algorithm used previously. 

The process for this pipeline went through several iterations before the final approach was settled on that addresses all issues (such as ghosting artifacts and issues with depth effects). 


The authors’ approach to producing stylized multi-view images has been successful at producing consistent stylized images with on-demand performance on GPU-accelerated mobile devices. The pipeline scales linearly with the number of views generated, which is quite different from other methods that increase greatly in runtime as the number of views increases. 

The pipeline performs faster than other methods, and is significantly faster when run on a GPU rather than CPU. The pipeline processed up to 16 views on GPU in close to 0.5 seconds, while the baseline comparison approach took over 5 seconds. On a CPU, the new pipeline took about 2.5 seconds while the comparison approach took over 33 seconds.

Stylized images shared on the Holopix™ platform were well received - they had 20% higher engagement than non-stylized images, and over 300% higher engagement than the platform’s average engagement per image. Qualitatively, when viewing the results on a Lightfield display one can see that the new pipeline produces better images that are easier for the eye to focus on than previous methods.

Overall this method shows promising results for quickly producing stylized images with eye-pleasing results and could be exactly what consumers are looking for, producing higher engagement on 3D photo sharing platforms. 

Read the full detailed research paper here:


Get a personalized demo of Spell and learn how the platform can streamline your Machine Learning Operations

Ready to Get Started?

Create an account in minutes or connect with our team to learn how Spell can accelerate your business.