3-D Scene Reconstruction Using Depth from Defocus and Deep Learning

Depth estimation is becoming increasingly important in computer vision applications. As the commercial industry moves forward with autonomous vehicle research and development, there is a demand for these systems to be able to gauge their 3D surroundings in order to avoid obstacles, and react to threats. This need requires depth estimation systems, and current research in self-driving vehicles now use LIDAR for 3D awareness. However, as LIDAR becomes more prevalent there is the potential for an increased risk of interference between this type of active measurement system on multiple vehicles. Passive methods, on the other hand, do not require the transmission of a signal in order to measure depth. Instead, they estimate the depth by using specific cues in the scene. Previous research, using a Depth from Defocus (DfD) single passive camera system, has shown that an in-focus image and an out-of-focus image can be used to produce a depth measure. This research introduces a new Deep Learning (DL) architecture that is capable of ingesting these image pairs to produce a depth map of the given scene improving both speed and performance over a range of lighting conditions. Compared to the previous state-of-the-art multi-label graph cut algorithms; the new DfD-Net produces a 63.7% and 33.6% improvement in the Normalized Root Mean Square Error (NRMSE) for the darkest and brightest images respectively. In addition to the NRMSE, an image quality metric (Structural Similarity Index (SSIM)) was also used to assess the DfD-Net performance. The DfD-Net produced a 3.6% increase (improvement) and a 2.3% reduction (slight decrease) in the SSIM metric for the darkest and brightest images respectively.

[1]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Alexey Ignatenko,et al.  Robust Shape from Focus via Markov Random Fields , 2009 .

[3]  Yoav Y. Schechner,et al.  Depth from Defocus vs. Stereo: How Different Really Are They? , 2004, International Journal of Computer Vision.

[4]  Frank Schaeffel,et al.  Processing of Information in the Human Visual System , 2007 .

[5]  Alex Pentland,et al.  A New Sense for Depth of Field , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Rangachar Kasturi,et al.  Machine vision , 1995 .

[7]  Chao Liu,et al.  Three dimensional moving pictures with a single imager and microfluidic lens , 2014, IEEE Transactions on Consumer Electronics.

[8]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[9]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Christopher Joseph Pal,et al.  Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Zhang Chen,et al.  A Learning-based Framework for Hybrid Depth-from-Defocus and Stereo Matching , 2017, ArXiv.

[12]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[13]  William Edward Crofts The generation of depth maps via depth-from-defocus , 2007 .

[14]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Zhiguo Cao,et al.  Monocular Relative Depth Perception with Web Stereo Data Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ashutosh Saxena,et al.  Depth Estimation Using Monocular and Stereo Cues , 2007, IJCAI.

[19]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20]  Daniel Cremers,et al.  Deep Depth From Focus , 2017, ACCV.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[24]  Shree K. Nayar,et al.  Rational Filters for Passive Depth from Defocus , 1998, International Journal of Computer Vision.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Simone Pasinetti,et al.  A Depth From Defocus Measurement System Using a Liquid Lens Objective for Extended Depth Range , 2017, IEEE Transactions on Instrumentation and Measurement.

[27]  Liang Lin,et al.  Single View Stereo Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  H. Sebastian Seung,et al.  Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks , 2003, Neural Computation.

[29]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Michael Möller,et al.  Variational Depth From Focus Reconstruction , 2014, IEEE Transactions on Image Processing.