Deep Optics for Monocular Depth Estimation and 3D Object Detection

Depth estimation and 3D object detection are critical for scene understanding but remain challenging to perform with a single image due to the loss of 3D information during image capture. Recent models using deep neural networks have improved monocular depth estimation performance, but there is still difficulty in predicting absolute depth and generalizing outside a standard dataset. Here we introduce the paradigm of deep optics, i.e. end-to-end design of optics and image processing, to the monocular depth estimation problem, using coded defocus blur as an additional depth cue to be decoded by a neural network. We evaluate several optical coding strategies along with an end-to-end optimization scheme for depth estimation on three datasets, including NYU Depth v2 and KITTI. We find an optimized freeform lens design yields the best results, but chromatic aberration from a singlet lens offers significantly improved performance as well. We build a physical prototype and validate that chromatic aberrations improve depth estimation on real-world results. In addition, we train object detection networks on the KITTI dataset and show that the lens optimized for depth estimation also results in improved 3D object detection performance.

[1]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ramesh Raskar,et al.  Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing , 2007, ACM Trans. Graph..

[3]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[7]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[9]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[10]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[11]  Alex Pentland,et al.  A New Sense for Depth of Field , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yan Lu,et al.  MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization , 2018, AAAI.

[15]  Stephen P. Boyd,et al.  End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging , 2018, ACM Trans. Graph..

[16]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[17]  Berthold K. P. Horn Obtaining shape from shading information , 1989 .

[18]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Gordon Wetzstein,et al.  Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification , 2018, Scientific Reports.

[20]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[21]  Wolfgang Heidrich,et al.  High-quality computational imaging through simple lenses , 2013, TOGS.

[22]  Frédéric Champagnat,et al.  Passive depth estimation using chromatic aberration and a depth from defocus approach. , 2013, Applied optics.

[23]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ashok Veeraraghavan,et al.  PhaseCam3D — Learning Phase Masks for Passive Single View Depth Estimation , 2019, 2019 IEEE International Conference on Computational Photography (ICCP).

[25]  Jitendra Malik,et al.  Inferring spatial layout from a single image via depth-ordered grouping , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[28]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Nicu Sebe,et al.  Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Raja Giryes,et al.  Depth Estimation From a Single Image Using Deep Learned Phase Coded Mask , 2018, IEEE Transactions on Computational Imaging.

[33]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[34]  Tomer Michaeli,et al.  Multicolor localization microscopy by deep learning , 2018, 1807.01637.

[35]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[36]  Oliver Cossairt,et al.  Spectral Focal Sweep: Extended depth of field from chromatic aberrations , 2010, 2010 IEEE International Conference on Computational Photography (ICCP).

[37]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Abhinav Gupta,et al.  Building Part-Based Object Detectors via 3D Geometry , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  R. Noll Zernike polynomials and atmospheric turbulence , 1976 .

[40]  Zhanyi Hu,et al.  Learning Depth From Single Images With Deep Neural Network Embedding Focal Length , 2018, IEEE Transactions on Image Processing.

[41]  Frédo Durand,et al.  4D frequency analysis of computational cameras for depth of field extension , 2009, SIGGRAPH '09.

[42]  Hiroshi Murase,et al.  Illumination planning for object recognition in structured environments , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  J. Goodman Introduction to Fourier optics , 1969 .

[45]  Bin Xu,et al.  Multi-level Fusion Based 3D Object Detection from Monocular Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Jian Zhang,et al.  Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[48]  Kiriakos N. Kutulakos,et al.  A Layer-Based Restoration Framework for Variable-Aperture Photography , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[49]  Frédo Durand,et al.  Image and depth from a conventional camera with a coded aperture , 2007, SIGGRAPH 2007.

[50]  Stephen Lin,et al.  Coded Aperture Pairs for Depth from Defocus and Defocus Deblurring , 2011, International Journal of Computer Vision.

[51]  Frédéric Champagnat,et al.  Deep Depth from Defocus: how can defocus blur improve 3D estimation using dense neural networks? , 2018, ECCV Workshops.