Deep eyes: Joint depth inference using monocular and binocular cues

Abstract Human visual system relies on both monocular focusness cues and binocular stereo cues to gain effective 3D perception. Correspondingly, depth from focus/defocus (DfF/DfD) and stereo matching are two most studied passive depth sensing schemes, which are traditionally solved in separate tracks. However, the two techniques are essentially complementary: the monocular cue from DfF/DfD can robustly handle repetitive textures and occlusion that are problematic for stereo matching whereas the binocular cue from stereo matching is insensitive to defocus blurs and can resolve large depth range. In this paper, we emulate human perception and present unified learning-based techniques to conduct hybrid DfF/DfD and stereo matching. We first construct a comprehensive focal stack dataset synthesized by depth-guided light field rendering. Next, we propose different network architectures to suit various inputs, including focal stack, stereo image pair, binocular focal stack, a focus-defocus image pair and defocus-stereo image triplet. We also exploit different connection methods between the separate networks for integrating them into an optimized solution to produce high fidelity disparity maps. For experiment, we further explore different hardware setup to capture both monocular and binocular depth cues. Results show that our new learning-based hybrid techniques can significantly improve accuracy and robustness in depth estimation.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Frédo Durand,et al.  Image and depth from a conventional camera with a coded aperture , 2007, SIGGRAPH 2007.

[3]  Dinggang Shen,et al.  Contour Knowledge Transfer for Salient Object Detection , 2018, ECCV.

[4]  Yang Yang,et al.  Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs , 2019, PRCV.

[5]  Tae-Sun Choi,et al.  Depth Map Estimation using a Robust Focus Measure , 2007, 2007 IEEE International Conference on Image Processing.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Thomas Pock,et al.  End-to-End Training of Hybrid CNN-CRF Models for Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ligang Liu,et al.  Euclidean and Hamming Embedding for Image Patch Description with Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  V. Bove Entropy-based depth from focus , 1993 .

[13]  Darius Burschka,et al.  Advances in Computational Stereo , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Steven M. Seitz,et al.  Depth from focus with your mobile phone , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shree K. Nayar,et al.  Shape from Focus , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Alex Pentland,et al.  A New Sense for Depth of Field , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[18]  Subhasis Chaudhuri,et al.  Depth estimation and image restoration using defocused stereo pairs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[20]  Wilson S. Geisler,et al.  Maximum-likelihood depth-from-defocus for active vision , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[21]  Michael Möller,et al.  Variational Depth From Focus Reconstruction , 2014, IEEE Transactions on Image Processing.

[22]  Stephen Lin,et al.  Coded aperture pairs for depth from defocus , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Stefano Soatto,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE IEEE TRANSACTION OF PATTERN RECO , 2022 .

[24]  Yoav Y. Schechner,et al.  Depth from Defocus vs. Stereo: How Different Really Are They? , 2004, International Journal of Computer Vision.

[25]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kyoung Mu Lee,et al.  Look Wider to Match Image Patches With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[27]  Sylvain Paris,et al.  Virtual DSLR: High Quality Dynamic Depth-of-Field Synthesis on Mobile Platforms , 2016, Digital Photography and Mobile Imaging.

[28]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ling Shao,et al.  An Iterative and Cooperative Top-Down and Bottom-Up Inference Network for Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ling Shao,et al.  See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[32]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[33]  Ting-Chun Wang,et al.  Depth from Semi-Calibrated Stereo and Defocus , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Murali Subbarao,et al.  Integration of defocus and focus analysis with stereo for 3D shape recovery , 1997, Other Conferences.

[35]  Shree K. Nayar,et al.  Transactions on Pattern Analysis and Machine Intelligence Flexible Depth of Field Photography 1 Depth of Field , 2022 .

[36]  Marc Levoy,et al.  Reconstructing Occluded Surfaces Using Synthetic Apertures: Stereo, Focus and Robust Measures , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[37]  Francesc Moreno-Noguer,et al.  Active refocusing of images and videos , 2007, SIGGRAPH 2007.

[38]  Xin Li,et al.  Hybrid Graph Neural Networks for Crowd Counting , 2020, AAAI.

[39]  Murali Subbarao,et al.  Depth from defocus by changing camera aperture: a spatial domain approach , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Stefano Soatto,et al.  A geometric approach to shape from defocus , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Peter Lawrence,et al.  A matrix based method for determining depth from focus , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Stefan B. Williams,et al.  Decoding, Calibration and Rectification for Lenselet-Based Plenoptic Cameras , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Shree K. Nayar,et al.  Shape from focus system , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Loyd A. Jones A New Method for Photographic Spectrophotometry , 1925 .

[46]  Kosuke Sato,et al.  Fusing Depth from Defocus and Stereo with Coded Apertures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Horst Bischof,et al.  Using Self-Contradiction to Learn Confidence Measures in Stereo Vision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Qi Guo,et al.  Focal Flow: Measuring Distance and Velocity with Defocus and Differential Motion , 2016, ECCV.

[49]  Jitendra Malik,et al.  Depth from Combining Defocus and Correspondence Using Light-Field Cameras , 2013, 2013 IEEE International Conference on Computer Vision.