论文信息 - LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR

LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR

Vision-based depth estimation is a key feature in autonomous systems, which often relies on a single camera or several independent ones. In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e.g., with 64 beams, or camera-only methods, which suffer from scale-ambiguity and infinite-depth problems. In this paper, we propose a new alternative of densely estimating metric depth by combining a monocular camera with a light-weight LiDAR, e.g., with 4 beams, typical of today’s automotive-grade mass-produced laser scanners. Inspired by recent self-supervised methods, we introduce a novel framework, called LiDARTouch, to estimate dense depth maps from monocular images with the help of “touches” of LiDAR, i.e., without the need for dense ground-truth depth. In our setup, the minimal LiDAR input contributes on three different levels: as an additional model’s input, in a self-supervised LiDAR reconstruction objective function, and to estimate changes of pose (a key component of self-supervised depth estimation architectures). Our LiDARTouch framework achieves new state of the art in self-supervised depth estimation on the KITTI dataset, thus supporting our choices of integrating the very sparse LiDAR signal with other visual features. Moreover, we show that the use of a few-beam LiDAR alleviates scale ambiguity and infinite-depth issues that camera-only methods suffer from. We also demonstrate that methods from the fully-supervised depth-completion literature can be adapted to a self-supervised regime with a minimal LiDAR signal.

Matthieu Cord | Karteek Alahari | Patrick P'erez | Florent Bartoccioni | 'Eloi Zablocki

[1] Terrence J. Sejnowski,et al. Unsupervised Learning , 2018, Encyclopedia of GIS.

[2] Kyungdon Joo,et al. Non-Local Spatial Propagation Network for Depth Completion , 2020, ECCV.

[3] David H. Douglas,et al. ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[4] Felix Heide,et al. Pixel-Accurate Depth Evaluation in Realistic Driving Scenarios , 2019, 2019 International Conference on 3D Vision (3DV).

[5] Sanja Fidler,et al. Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D , 2020, ECCV.

[6] K. Madhava Krishna,et al. INFER: INtermediate representations for FuturE pRediction , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] Zhichao Yin,et al. GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Hongdong Li,et al. Noise-Aware Unsupervised Deep Lidar-Stereo Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Simon Lucey,et al. Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Dacheng Tao,et al. Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Raoul de Charette,et al. xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Hujun Bao,et al. Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Alex Kendall,et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[15] Dacheng Tao,et al. Adaptive Context-Aware Multi-Modal Network for Depth Completion , 2020, IEEE Transactions on Image Processing.

[16] Cedric Nishan Canagarajah,et al. Structural Similarity-Based Object Tracking in Video Sequences , 2006, 2006 9th International Conference on Information Fusion.

[17] Rares Ambrus,et al. Semantically-Guided Representation Learning for Self-Supervised Monocular Depth , 2020, ICLR.

[18] Joseph E. Gonzalez,et al. BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud , 2020, ArXiv.

[19] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Longin Jan Latecki,et al. Unsupervised object region proposals for RGB-D indoor scenes , 2017, Comput. Vis. Image Underst..

[22] Thomas Mensink,et al. On the Benefit of Adversarial Training for Monocular Depth Estimation , 2019, Comput. Vis. Image Underst..

[23] Simon Lucey,et al. Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25] Jörg Stückler,et al. Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Gérard G. Medioni,et al. RGB-D camera based wearable navigation system for the visually impaired , 2016, Comput. Vis. Image Underst..

[27] Sertac Karaman,et al. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[29] Fawzi Nashashibi,et al. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[30] Yong-Sheng Chen,et al. Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Anelia Angelova,et al. Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Hong Zhang,et al. Semi-Supervised Monocular Depth Estimation with Left-Right Consistency Using Deep Neural Network , 2019, 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[33] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Jianliang Tang,et al. Complete Solution Classification for the Perspective-Three-Point Problem , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[35] Anelia Angelova,et al. Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos , 2018, AAAI.

[36] Senthil Yogamani,et al. Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[37] Gabriel J. Brostow,et al. Self-Supervised Monocular Depth Hints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Jie Tang,et al. Learning Guided Convolutional Network for Depth Completion , 2019, IEEE Transactions on Image Processing.

[40] Rohit Mohan,et al. EfficientPS: Efficient Panoptic Segmentation , 2020, International Journal of Computer Vision.

[41] Anelia Angelova,et al. Unsupervised Monocular Depth and Ego-Motion Learning With Structure and Semantics , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42] Jie Li,et al. Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances , 2019, CoRL.

[43] Toshiaki Fujii,et al. Adversarial Patch Attacks on Monocular Depth Estimation Networks , 2020, IEEE Access.

[44] Nan Yang,et al. Learning Monocular 3D Vehicle Detection Without 3D Bounding Box Labels , 2020, GCPR.

[45] Rares Ambrus,et al. 3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Sertac Karaman,et al. Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[47] V. Lepetit,et al. EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[48] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[49] Sergio Casas,et al. End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Thomas Brox,et al. Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).