Zero-Shot Depth Estimation From Light Field Using A Convolutional Neural Network

This article proposes a zero-shot learning-based framework for light field depth estimation, which learns an end-to-end mapping solely from an input light field to the corresponding disparity map with neither extra training data nor supervision of groundtruth depth. The proposed method overcomes two major difficulties posed in existing learning-based methods and is thus much more feasible in practice. First, it saves the huge burden of obtaining groundtruth depth of a variety of scenes to serve as labels during training. Second, it avoids the severe domain shift effect when applied to light fields with drastically different content or captured under different camera configurations from the training data. On the other hand, compared with conventional non-learning-based methods, the proposed method better exploits the correlations in the 4D light field and generates much superior depth results. Moreover, we extend this zero-shot learning framework to depth estimation from light field videos. For the first time, we demonstrate that more accurate and robust depth can be estimated from light field videos by jointly exploiting the correlations across spatial, angular, and temporal dimensions. We conduct comprehensive experiments on both synthetic and real-world light field image datasets, as well as a self collected light field video dataset. Quantitative and qualitative results validate the superior performance of our method over the state-of-the-arts, especially for the challenging real-world scenes.

[1]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[2]  Jonathan T. Barron,et al.  Aperture Supervision for Monocular Depth Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Qionghai Dai,et al.  Light Field Image Processing: An Overview , 2017, IEEE Journal of Selected Topics in Signal Processing.

[4]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Marc Pollefeys,et al.  Scalable structure from motion for densely sampled videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Michal Irani,et al.  "Zero-Shot" Super-Resolution Using Deep Internal Learning , 2017, CVPR.

[8]  Dong Liu,et al.  Unsupervised Depth Estimation from Light Field Using a Convolutional Neural Network , 2018, 2018 International Conference on 3D Vision (3DV).

[9]  In Kyu Park,et al.  Robust Light Field Depth Estimation Using Occlusion-Noise Aware Data Costs , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  Williem,et al.  Synthesizing a 4D Spatio-Angular Consistent Light Field from a Single Image , 2019, ArXiv.

[12]  Wei Xu,et al.  Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency , 2017, ArXiv.

[13]  Zhiwei Xiong,et al.  Fusion of Time-of-Flight and Phase Shifting for high-resolution and low-latency depth sensing , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[14]  Andrew Lumsdaine,et al.  Superresolution with the focused plenoptic camera , 2011, Electronic Imaging.

[15]  Yael Pritch,et al.  Scene reconstruction from high spatio-angular resolution light fields , 2013, ACM Trans. Graph..

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Thomas Pock,et al.  Convolutional Networks for Shape from Light Field , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lennart Wietzke,et al.  Single lens 3D-camera with extended depth-of-field , 2012, Electronic Imaging.

[19]  Didier Stricker,et al.  Fast and Efficient Depth Map Estimation from Light Fields , 2017, 2017 International Conference on 3D Vision (3DV).

[20]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[21]  Ivo Ihrke,et al.  Light-Field Microscopy with a Consumer Light-Field Camera , 2015, 2015 International Conference on 3D Vision.

[22]  Olga Sorkine-Hornung,et al.  Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction , 2016, ACM Trans. Graph..

[23]  Xiaodong Yu,et al.  Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example , 2010, ECCV.

[24]  Bastian Goldlücke,et al.  Light Field Intrinsics with a Deep Encoder-Decoder Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Stefan B. Williams,et al.  Decoding, Calibration and Rectification for Lenselet-Based Plenoptic Cameras , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Can Chen,et al.  Depth Recovery from Light Field Using Focal Stack Symmetry , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Sven Wanner,et al.  Variational Light Field Analysis for Disparity Estimation and Super-Resolution , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Ming Shao,et al.  Low-Rank Embedded Ensemble Semantic Dictionary for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Dong Liu,et al.  LF-fusion: Dense and accurate 3D reconstruction from light field images , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[30]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Peter Eisert,et al.  Real-time generation of multi-view video plus depth content using mixed narrow and wide baseline , 2014, J. Vis. Commun. Image Represent..

[32]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Ravi Ramamoorthi,et al.  A Light Transport Framework for Lenslet Light Field Cameras , 2015, TOGS.

[34]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  E. Adelson,et al.  The Plenoptic Function and the Elements of Early Vision , 1991 .

[36]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  In-So Kweon,et al.  Accurate depth map estimation from a lenslet light field camera , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Wei Yu,et al.  Neural EPI-Volume Networks for Shape from Light Field , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[40]  Bastian Goldlücke,et al.  What Sparse Light Field Coding Reveals about Scene Structure , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Shing-Chow Chan,et al.  Light Field , 2014, Computer Vision, A Reference Guide.

[42]  Zhiwei Xiong,et al.  Real-Time Scalable Depth Sensing With Hybrid Structured Light Illumination , 2014, IEEE Transactions on Image Processing.

[43]  J. P. Luke,et al.  Simultaneous estimation of super-resolved depth and all-in-focus images from a plenoptic camera , 2009, 2009 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[44]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[45]  Ramesh Raskar,et al.  Signal Processing for Time-of-Flight Imaging Sensors: An introduction to inverse problems in computational 3-D imaging , 2016, IEEE Signal Processing Magazine.

[46]  In-So Kweon,et al.  EPINET: A Fully-Convolutional Neural Network Using Epipolar Geometry for Depth from Light Field Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Andrew Lumsdaine,et al.  The focused plenoptic camera , 2009, 2009 IEEE International Conference on Computational Photography (ICCP).

[48]  Jitendra Malik,et al.  Depth from Combining Defocus and Correspondence Using Light-Field Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Zhiwei Xiong,et al.  Computational Depth Sensing : Toward high-performance commodity depth cameras , 2017, IEEE Signal Processing Magazine.

[50]  Alexei A. Efros,et al.  Occlusion-Aware Depth Estimation Using Light-Field Cameras , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Bastian Goldlücke,et al.  A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields , 2016, ACCV.

[52]  Thomas Pock,et al.  Shape from Light Field Meets Robust PCA , 2014, ECCV.

[53]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[54]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[55]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Zhan Yu,et al.  Line Assisted Light Field Triangulation and Stereo Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[57]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[58]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[59]  Yang Yang,et al.  Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).