Learning-based view synthesis for light field cameras

With the introduction of consumer light field cameras, light field imaging has recently become widespread. However, there is an inherent trade-off between the angular and spatial resolution, and thus, these cameras often sparsely sample in either spatial or angular domain. In this paper, we use machine learning to mitigate this trade-off. Specifically, we propose a novel learning-based approach to synthesize new views from a sparse set of input views. We build upon existing view synthesis techniques and break down the process into disparity and color estimation components. We use two sequential convolutional neural networks to model these two components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images. We show the performance of our approach using only four corner sub-aperture views from the light fields captured by the Lytro Illum camera. Experimental results show that our approach synthesizes high-quality images that are superior to the state-of-the-art techniques on a variety of challenging real-world scenes. We believe our method could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.

[1]  Yaser Sheikh,et al.  3D object manipulation in a single photograph using stock 3D models , 2014, ACM Trans. Graph..

[2]  Stefan Harmeling,et al.  Image denoising: Can plain neural networks compete with BM3D? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Leonidas J. Guibas,et al.  3D-Assisted Image Feature Synthesis for Novel Views of an Object , 2014, ArXiv.

[4]  Xin Tong,et al.  Interactive rendering from compressed light fields , 2003, IEEE Trans. Circuits Syst. Video Technol..

[5]  Alexei A. Efros,et al.  Occlusion-Aware Depth Estimation Using Light-Field Cameras , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Andrew W. Fitzgibbon,et al.  Image-Based Rendering Using Image-Based Priors , 2005, International Journal of Computer Vision.

[7]  Gordon Wetzstein,et al.  Compressive light field photography using overcomplete dictionaries and optimized projections , 2013, ACM Trans. Graph..

[8]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[9]  Sven Wanner,et al.  Globally consistent depth labeling of 4D light fields , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Marc Levoy,et al.  High performance imaging using large camera arrays , 2005, ACM Trans. Graph..

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Eli Shechtman,et al.  Regenerative morphing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Jitendra Malik,et al.  Depth from Combining Defocus and Correspondence Using Light-Field Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Bernd Girod,et al.  Light field compression using disparity-compensated lifting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Frédo Durand,et al.  Linear view synthesis using a dimensionality gap light field prior , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[18]  Tom E. Bishop,et al.  Light field superresolution , 2009, 2009 IEEE International Conference on Computational Photography (ICCP).

[19]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Edward H. Adelson,et al.  Single Lens Stereo with a Plenoptic Camera , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[22]  Jean Ponce,et al.  Learning a convolutional neural network for non-uniform motion blur removal , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Marc Levoy,et al.  High performance imaging using large camera arrays , 2005, SIGGRAPH 2005.

[24]  George Drettakis,et al.  Silhouette‐Aware Warping for Image‐Based Rendering , 2011, Comput. Graph. Forum.

[25]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[26]  Thomas Brox,et al.  Single-view to Multi-view: Reconstructing Unseen Views with a Convolutional Network , 2015, ArXiv.

[27]  Thomas Pock,et al.  Convolutional Networks for Shape from Light Field , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[29]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[30]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[31]  Frédo Durand,et al.  Light Field Reconstruction Using Sparsity in the Continuous Fourier Domain , 2014, ACM Trans. Graph..

[32]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Sven Wanner,et al.  Variational Light Field Analysis for Disparity Estimation and Super-Resolution , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Shi-Min Hu,et al.  PlenoPatch: Patch-Based Plenoptic Image Manipulation , 2017, IEEE Transactions on Visualization and Computer Graphics.

[35]  Anita Sellent,et al.  Floating Textures , 2008, Comput. Graph. Forum.

[36]  Wojciech Matusik,et al.  Moving gradients: a path-based method for plausible image interpolation , 2009, ACM Trans. Graph..

[37]  In-So Kweon,et al.  Accurate depth map estimation from a lenslet light field camera , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  David Salesin,et al.  Spatio-angular resolution tradeoffs in integral photography , 2006, EGSR '06.

[39]  Jitendra Malik,et al.  Depth from shading, defocus, and correspondence using light-field angular coherence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  In-So Kweon,et al.  Learning a Deep Convolutional Network for Light-Field Image Super-Resolution , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[41]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[42]  Ashok Veeraraghavan,et al.  Light field denoising, light field superresolution and stereo camera based refocussing using a GMM light field patch prior , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[43]  R. Szeliski,et al.  Ambient point clouds for view interpolation , 2010, ACM Trans. Graph..

[44]  Yu-Wing Tai,et al.  Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Leonidas J. Guibas,et al.  3D-Assisted Feature Synthesis for Novel Views of an Object , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Qionghai Dai,et al.  Light field from micro-baseline image pair , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Clemens Birklbauer,et al.  Directional Super-Resolution by Means of Coded Sampling and Guided Upsampling , 2015, 2015 IEEE International Conference on Computational Photography (ICCP).

[48]  P. Belhumeur,et al.  Moving gradients: a path-based method for plausible image interpolation , 2009, SIGGRAPH 2009.

[49]  Scott E. Reed,et al.  Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[50]  Simon Fuhrmann,et al.  Ambient point clouds for view interpolation , 2010, SIGGRAPH 2010.

[51]  Xiaoqing Zhu,et al.  Light field compression using disparity-compensated lifting , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[52]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.