Light field video capture using a learning-based hybrid imaging system

Light field cameras have many advantages over traditional cameras, as they allow the user to change various camera settings after capture. However, capturing light fields requires a huge bandwidth to record the data: a modern light field camera can only take three images per second. This prevents current consumer light field cameras from capturing light field videos. Temporal interpolation at such extreme scale (10x, from 3 fps to 30 fps) is infeasible as too much information will be entirely missing between adjacent frames. Instead, we develop a hybrid imaging system, adding another standard video camera to capture the temporal information. Given a 3 fps light field sequence and a standard 30 fps 2D video, our system can then generate a full light field video at 30 fps. We adopt a learning-based approach, which can be decomposed into two steps: spatio-temporal flow estimation and appearance estimation. The flow estimation propagates the angular information from the light field sequence to the 2D video, so we can warp input images to the target view. The appearance estimation then combines these warped images to output the final pixels. The whole process is trained end-to-end using convolutional neural networks. Experimental results demonstrate that our algorithm outperforms current video interpolation methods, enabling consumer light field videography, and making applications such as refocusing and parallax view generation achievable on videos for the first time.

[1]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[2]  In-So Kweon,et al.  Accurate depth map estimation from a lenslet light field camera , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yu-Wing Tai,et al.  Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Qionghai Dai,et al.  The Light Field Attachment: Turning a DSLR into a Light Field Camera Using a Low Budget Camera Ring , 2017, IEEE Trans. Vis. Comput. Graph..

[5]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[6]  Gordon Wetzstein,et al.  Compressive light field photography using overcomplete dictionaries and optimized projections , 2013, ACM Trans. Graph..

[7]  Ashok Veeraraghavan,et al.  Light field denoising, light field superresolution and stereo camera based refocussing using a GMM light field patch prior , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[9]  R. Szeliski,et al.  Ambient point clouds for view interpolation , 2010, ACM Trans. Graph..

[10]  Haibin Ling,et al.  Saliency Detection on Light Field , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Maneesh Agrawala,et al.  Using Photographs to Enhance Videos of a Static Scene , 2007, Rendering Techniques.

[12]  Yasuyuki Matsushita,et al.  High-resolution hyperspectral imaging via matrix factorization , 2011, CVPR 2011.

[13]  Meng Wang,et al.  Learning-Based, Automatic 2D-to-3D Image and Video Conversion , 2013, IEEE Transactions on Image Processing.

[14]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Tom E. Bishop,et al.  Light field superresolution , 2009, 2009 IEEE International Conference on Computational Photography (ICCP).

[16]  Qionghai Dai,et al.  Light Field Reconstruction Using Deep Convolutional Network on EPI , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ting-Chun Wang,et al.  Depth from Semi-Calibrated Stereo and Defocus , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jing Liao,et al.  Automating Image Morphing Using Structural Similarity on a Halfway Domain , 2014, ACM Trans. Graph..

[19]  Max Grosse,et al.  Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Stephen Lin,et al.  High resolution multispectral video capture with a hybrid camera system , 2011, CVPR 2011.

[22]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Ashok Veeraraghavan,et al.  Improving resolution and depth-of-field of light field cameras using a hybrid imaging system , 2014, 2014 IEEE International Conference on Computational Photography (ICCP).

[24]  Meng Wang,et al.  2D-to-3D image conversion by learning depth from examples , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Dani Lischinski,et al.  Non-rigid dense correspondence with applications for image enhancement , 2011, ACM Trans. Graph..

[26]  In-So Kweon,et al.  Learning a Deep Convolutional Network for Light-Field Image Super-Resolution , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[27]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Mark A. Horowitz,et al.  Light field video camera , 2000, IS&T/SPIE Electronic Imaging.

[29]  Frédo Durand,et al.  Linear view synthesis using a dimensionality gap light field prior , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Alexei A. Efros,et al.  Occlusion-Aware Depth Estimation Using Light-Field Cameras , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Anita Sellent,et al.  Floating Textures , 2008, Comput. Graph. Forum.

[32]  Shree K. Nayar,et al.  Motion deblurring using hybrid imaging , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[33]  Keith J. Hanna,et al.  Hybrid stereo camera: an IBR approach for synthesis of very high resolution stereoscopic image sequences , 2001, SIGGRAPH.

[34]  Wojciech Matusik,et al.  Moving gradients: a path-based method for plausible image interpolation , 2009, ACM Trans. Graph..

[35]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[40]  Yu-Wing Tai,et al.  Consistent Matting for Light Field Images , 2014, ECCV.

[41]  Frédo Durand,et al.  Light Field Reconstruction Using Sparsity in the Continuous Fourier Domain , 2014, ACM Trans. Graph..

[42]  Sven Wanner,et al.  Variational Light Field Analysis for Disparity Estimation and Super-Resolution , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Qionghai Dai,et al.  Light field from micro-baseline image pair , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Alexei A. Efros,et al.  A 4D Light-Field Dataset and CNN Architectures for Material Recognition , 2016, ECCV.