MonSter: Awakening the Mono in Stereo

Passive depth estimation is among the most long-studied fields in computer vision. The most common methods for passive depth estimation are either a stereo or a monocular system. Using the former requires an accurate calibration process, and has a limited effective range. The latter, which does not require extrinsic calibration but generally achieves inferior depth accuracy, can be tuned to achieve better results in part of the depth range. In this work, we suggest combining the two frameworks. We propose a two-camera system, in which the cameras are used jointly to extract a stereo depth and individually to provide a monocular depth from each camera. The combination of these depth maps leads to more accurate depth estimation. Moreover, enforcing consistency between the extracted maps leads to a novel online self-calibration strategy. We present a prototype camera that demonstrates the benefits of the proposed combination, for both self-calibration and depth reconstruction in real-world scenes.

[1]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[2]  Yoav Y. Schechner,et al.  Depth from Defocus vs. Stereo: How Different Really Are They? , 2004, International Journal of Computer Vision.

[3]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[5]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Rajiv Gupta,et al.  Stereo from uncalibrated cameras , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[8]  Trevor Darrell,et al.  Hierarchical Discrete Distribution Decomposition for Match Density Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Steven M. Seitz,et al.  Depth from focus with your mobile phone , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jungwon Lee,et al.  AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks , 2019, ArXiv.

[11]  Yang Yang,et al.  Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs , 2019, PRCV.

[12]  Nick Schneider,et al.  RegNet: Multimodal sensor registration using deep neural networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[13]  Rob Fergus,et al.  Blind deconvolution using a normalized sparsity measure , 2011, CVPR 2011.

[14]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[15]  Trevor Darrell,et al.  Pyramid based depth from focus , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Hans-Joachim Wünsche,et al.  Continuous Stereo Self-Calibration on Planar Roads , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[17]  Frédéric Champagnat,et al.  Deep Depth from Defocus: how can defocus blur improve 3D estimation using dense neural networks? , 2018, ECCV Workshops.

[18]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Prasan A. Shedligeri,et al.  Data driven coded aperture design for depth recovery , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[20]  Amlaan Bhoi,et al.  Monocular Depth Estimation: A Survey , 2019, ArXiv.

[21]  I HartleyRichard In Defense of the Eight-Point Algorithm , 1997 .

[22]  Emanuel Marom,et al.  Computational multi-focus imaging combining sparse model with color dependent phase mask. , 2015, Optics express.

[23]  Gordon Wetzstein,et al.  Deep Optics for Monocular Depth Estimation and 3D Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Frédéric Champagnat,et al.  Passive depth estimation using chromatic aberration and a depth from defocus approach. , 2013, Applied optics.

[26]  Yang Yang,et al.  Deep Depth Inference using Binocular and Monocular Cues , 2017 .

[27]  Raja Giryes,et al.  Depth Estimation From a Single Image Using Deep Learned Phase Coded Mask , 2018, IEEE Transactions on Computational Imaging.

[28]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[29]  Qionghai Dai,et al.  Coded focal stack photography , 2013, IEEE International Conference on Computational Photography (ICCP).

[30]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[31]  Ruigang Yang,et al.  Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Stephen Lin,et al.  Coded Aperture Pairs for Depth from Defocus and Defocus Deblurring , 2011, International Journal of Computer Vision.

[34]  Lior Wolf,et al.  Single Image Depth Estimation Trained via Depth From Defocus Cues , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Albert J. P. Theuwissen,et al.  Computational imaging , 2012, 2012 IEEE International Solid-State Circuits Conference.

[36]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  David Lee,et al.  Dual Aperture Photography: Image and Depth from a Mobile Camera , 2015, 2015 IEEE International Conference on Computational Photography (ICCP).

[38]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Ayan Chakrabarti,et al.  Depth and Deblurring from a Spectrally-Varying Depth-of-Field , 2012, ECCV.

[40]  Shree K. Nayar,et al.  Depth from Diffusion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Raja Giryes,et al.  Learned phase coded aperture for the benefit of depth of field extension. , 2018, Optics express.