Deep Depth from Defocus: how can defocus blur improve 3D estimation using dense neural networks?

Depth estimation is critical interest for scene understanding and accurate 3D reconstruction. Most recent approaches with deep learning exploit geometrical structures of standard sharp images to predict depth maps. However, cameras can also produce images with defocus blur depending on the depth of the objects and camera settings. Hence, these features may represent an important hint for learning to predict depth. In this paper, we propose a full system for single-image depth prediction in the wild using depth-from-defocus and neural networks. We carry out thorough experiments real and simulated defocused images using a realistic model of blur variation with respect to depth. We also investigate the influence of blur on depth prediction observing model uncertainty with a Bayesian neural network approach. From these studies, we show that out-of-focus blur greatly improves the depth-prediction network performances. Furthermore, we transfer the ability learned on a synthetic, indoor dataset to real, indoor and outdoor images. For this purpose, we present a new dataset with real all-focus and defocused images from a DSLR camera, paired with ground truth depth maps obtained with an active 3D sensor for indoor scenes. The proposed approach is successfully validated on both this new dataset and standard ones as NYUv2 or Depth-in-the-Wild. Code and new datasets are available at https://github.com/marcelampc/d3net_depth_estimation.

[1]  Daniel Cremers,et al.  Deep Depth From Focus , 2017, ACCV.

[2]  Gregory Shakhnarovich,et al.  Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions , 2016, NIPS.

[3]  Nicu Sebe,et al.  Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kwanghoon Sohn,et al.  Depth prediction from a single image with conditional adversarial networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[5]  Ayan Chakrabarti,et al.  Depth and Deblurring from a Spectrally-Varying Depth-of-Field , 2012, ECCV.

[6]  Chunhua Shen,et al.  Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[8]  Fatih Murat Porikli,et al.  Depth Estimation and Blur Removal from a Single Out-of-focus Image , 2017, BMVC.

[9]  Frédéric Champagnat,et al.  Passive depth estimation using chromatic aberration and a depth from defocus approach. , 2013, Applied optics.

[10]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[12]  Jean-Michel Morel,et al.  The Non-parametric Sub-pixel Local Point Spread Function Estimation Is a Well Posed Problem , 2011, International Journal of Computer Vision.

[13]  Frédéric Champagnat,et al.  Single image local blur identification , 2011, 2011 18th IEEE International Conference on Image Processing.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Terence Sim,et al.  Defocus map estimation from a single image , 2011, Pattern Recognit..

[16]  Frédéric Champagnat,et al.  On Regression Losses for Deep Depth Estimation , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[17]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[18]  Frédéric Guichard,et al.  Extended depth-of-field using sharpness transport across color channels , 2009, Electronic Imaging.

[19]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[20]  Anita Sellent,et al.  Which side of the focal plane are you on? , 2014, 2014 IEEE International Conference on Computational Photography (ICCP).

[21]  Paolo Favaro,et al.  Single Image Blind Deconvolution with Higher-Order Texture Statistics , 2010, Video Processing and Computational Video.

[22]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[23]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[24]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[25]  Alex Pentland,et al.  A New Sense for Depth of Field , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  P. Hanrahan,et al.  Light Field Photography with a Hand-held Plenoptic Camera , 2005 .

[28]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[30]  Alan L. Yuille,et al.  Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33]  Kiriakos N. Kutulakos,et al.  A Layer-Based Restoration Framework for Variable-Aperture Photography , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Chunhua Shen,et al.  Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[36]  Ramesh Raskar,et al.  Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing , 2007, ACM Trans. Graph..

[37]  Jonathan T. Barron,et al.  Aperture Supervision for Monocular Depth Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Frédo Durand,et al.  Image and depth from a conventional camera with a coded aperture , 2007, ACM Trans. Graph..

[39]  Frédo Durand,et al.  Understanding and evaluating blind deconvolution algorithms , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Vicent Caselles,et al.  Recovering Relative Depth from Low-Level Features Without Explicit T-junction Detection and Interpretation , 2013, International Journal of Computer Vision.