Learning to Synthesize a 4D RGBD Light Field from a Single Image

We present a machine learning algorithm that takes as input a 2D RGB image and synthesizes a 4D RGBD light field (color and depth of the scene in each ray direction). For training, we introduce the largest public light field dataset, consisting of over 3300 plenoptic camera light fields of scenes containing flowers and plants. Our synthesis pipeline consists of a convolutional neural network (CNN) that estimates scene geometry, a stage that renders a Lambertian light field using that geometry, and a second CNN that predicts occluded rays and non-Lambertian effects. Our algorithm builds on recent view synthesis methods, but is unique in predicting RGBD for each light field ray and improving unsupervised single image depth estimation by enforcing consistency of ray depths that should intersect the same scene point.

[1]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[2]  Ali Farhadi,et al.  Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[3]  G. Stiny Shape , 1999 .

[4]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[5]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[6]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[7]  Frédo Durand,et al.  Light Field Reconstruction Using Sparsity in the Continuous Fourier Domain , 2014, ACM Trans. Graph..

[8]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[10]  DrettakisGeorge,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013 .

[11]  R. Szeliski,et al.  Ambient point clouds for view interpolation , 2010, ACM Trans. Graph..

[12]  Scott E. Reed,et al.  Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[13]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[14]  Qionghai Dai,et al.  Light field from micro-baseline image pair , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Pat Hanrahan,et al.  Digital correction of lens aberrations in light field photography , 2006, International Optical Design Conference.

[18]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Frédo Durand,et al.  Joint view expansion and filtering for automultiscopic 3D displays , 2013, ACM Trans. Graph..

[20]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[22]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[25]  Ravi Ramamoorthi,et al.  Oriented Light-Field Windows for Scene Flow , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  In-So Kweon,et al.  Accurate depth map estimation from a lenslet light field camera , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[28]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[29]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[30]  Ashutosh Saxena,et al.  Learning 3-D Scene Structure from a Single Still Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[35]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[36]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[37]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[38]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.