Pixel-Accurate Depth Evaluation in Realistic Driving Scenarios

This work introduces an evaluation benchmark for depth estimation and completion using high-resolution depth measurements with angular resolution of up to 25" (arcsecond), akin to a 50 megapixel camera with per-pixel depth available. Existing datasets, such as the KITTI benchmark, provide only sparse reference measurements with an order of magnitude lower angular resolution - these sparse measurements are treated as ground truth by existing depth estimation methods. We propose an evaluation methodology in four characteristic automotive scenarios recorded in varying weather conditions (day, night, fog, rain). As a result, our benchmark allows us to evaluate the robustness of depth sensing methods in adverse weather and different driving conditions. Using the proposed evaluation data, we demonstrate that current stereo approaches provide significantly more stable depth estimates than monocular methods and lidar completion in adverse weather. Data and code are available at https://github.com/gruberto/PixelAccurateDepthBenchmark.git.

[1]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Robert Lange,et al.  3D time-of-flight distance measurement with custom solid-state image sensors in CMOS/CCD-technology , 2006 .

[5]  J. Underwood,et al.  Towards reliable perception for Unmanned Ground Vehicles in challenging conditions , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Wolfram Burgard,et al.  Robotics: Science and Systems XV , 2010 .

[7]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Zhao Chen,et al.  Estimating Depth from RGB and Sparse Sensing , 2018, ECCV.

[9]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Brent Schwarz,et al.  LIDAR: Mapping the world in 3D , 2010 .

[11]  Andrew Zisserman,et al.  Feature Based Methods for Structure and Motion Estimation , 1999, Workshop on Vision Algorithms.

[12]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[14]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[17]  Pascal Yiou,et al.  On the roles of circulation and aerosols in the decline of mist and dense fog in Europe over the last 30 years , 2009 .

[18]  Yunfei Long,et al.  Depth Coefficients for Depth Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Werner Ritter,et al.  Benchmarking Image Sensors Under Adverse Weather Conditions for Autonomous Driving , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[20]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[23]  Stefano Mattoccia,et al.  Generative Adversarial Networks for Unsupervised Monocular Depth Prediction , 2018, ECCV Workshops.

[24]  Ruigang Yang,et al.  The ApolloScape Dataset for Autonomous Driving , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Jean-Jacques Boreux,et al.  An innovative artificial fog production device improved in the European project “FOG” , 2008 .

[26]  William T. Freeman,et al.  Learning the Depths of Moving People by Watching Frozen People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Reinhard Koch,et al.  Time‐of‐Flight Cameras in Computer Graphics , 2010, Comput. Graph. Forum.

[31]  Matthew O'Toole,et al.  Primal-dual coding to probe light transport , 2012, ACM Trans. Graph..

[32]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[33]  Radu Horaud,et al.  Time-of-Flight Cameras , 2012, SpringerBriefs in Computer Science.

[34]  Matthew O'Toole,et al.  Temporal frequency probing for 5D transient analysis of global light transport , 2014, ACM Trans. Graph..

[35]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[36]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  J J Koenderink,et al.  Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[38]  Rares Ambrus,et al.  SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[39]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[40]  Ashutosh Saxena,et al.  Learning 3-D Scene Structure from a Single Still Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[41]  Nicu Sebe,et al.  Unsupervised Adversarial Depth Estimation Using Cycled Generative Networks , 2018, 2018 International Conference on 3D Vision (3DV).

[42]  Andreas Riener,et al.  Challenges in Object Detection Under Rainy Weather Conditions , 2018, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering.

[43]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[44]  Fukui Kazuhiro,et al.  Realistic CG Stereo Image Dataset With Ground Truth Disparity Maps , 2012 .

[45]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[47]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[48]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[49]  Murali Subbarao,et al.  Depth from defocus: A spatial domain approach , 1994, International Journal of Computer Vision.

[50]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[51]  Michèle Colomb,et al.  Innovative artificial fog production device - A technical facility for research activities. , 2004 .

[52]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[53]  Klaus C. J. Dietmayer,et al.  Gated2Depth: Real-Time Dense Lidar From Gated Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Luc Van Gool,et al.  Semantic Foggy Scene Understanding with Synthetic Data , 2017, International Journal of Computer Vision.

[56]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Kate Saenko,et al.  Syn2Real: A New Benchmark forSynthetic-to-Real Visual Domain Adaptation , 2018, ArXiv.

[58]  Werner Ritter,et al.  A Benchmark for Lidar Sensors in Fog: Is Detection Breaking Down? , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[59]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[60]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[61]  John P. Oakley,et al.  Improving image quality in poor visibility conditions using a physical model for contrast degradation , 1998, IEEE Trans. Image Process..

[62]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[63]  Alan L. Yuille,et al.  Rethinking Monocular Depth Estimation with Adversarial Training , 2018, ArXiv.

[64]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  William Whittaker,et al.  Epipolar time-of-flight imaging , 2017, ACM Trans. Graph..