论文信息 - MSDPN: Monocular Depth Prediction with Partial Laser Observation using Multi-stage Neural Networks

MSDPN: Monocular Depth Prediction with Partial Laser Observation using Multi-stage Neural Networks

In this study, a deep-learning-based multi-stage network architecture called Multi-Stage Depth Prediction Network (MSDPN) is proposed to predict a dense depth map using a 2D LiDAR and a monocular camera. Our proposed network consists of a multi-stage encoder-decoder architecture and Cross Stage Feature Aggregation (CSFA). The proposed multi-stage encoder-decoder architecture alleviates the partial observation problem caused by the characteristics of a 2D LiDAR, and CSFA prevents the multi-stage network from diluting the features and allows the network to learn the inter-spatial relationship between features better. Previous works use sub-sampled data from the ground truth as an input rather than actual 2D LiDAR data. In contrast, our approach trains the model and conducts experiments with a physically-collected 2D LiDAR dataset. To this end, we acquired our own dataset called KAIST RGBD-scan dataset and validated the effectiveness and the robustness of MSDPN under realistic conditions. As verified experimentally, our network yields promising performance against state-of-the-art methods. Additionally, we analyzed the performance of different input methods and confirmed that the reference depth map is robust in untrained scenarios.

Hyun Myung | Hyungtae Lim | Hyeonjae Gil

[1] Hugh F. Durrant-Whyte,et al. Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[2] Paolo Valigi,et al. Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3] Gang Yu,et al. Rethinking on Multi-Stage Networks for Human Pose Estimation , 2019, ArXiv.

[4] Ashutosh Saxena,et al. 3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[5] Sertac Karaman,et al. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6] Ruigang Yang,et al. Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[7] Hyun Myung,et al. One-way ViSP (Visually Servoed Paired structured light system) for structural displacement monitoring , 2017 .

[8] Wolfram Burgard,et al. Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9] Gustavo Carneiro,et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[10] Yinda Zhang,et al. ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems , 2018, ECCV.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Jianing Qian,et al. FusionMapping: Learning Depth Prediction with Monocular Images and 2D Laser Scans , 2019, ArXiv.

[13] In So Kweon,et al. Depth Completion with Deep Geometry and Context Guidance , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[14] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[15] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[16] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[17] Cyrill Stachniss,et al. SuMa++: Efficient LiDAR-based Semantic SLAM , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Ryan M. Eustice,et al. University of Michigan North Campus long-term vision and lidar dataset , 2016, Int. J. Robotics Res..

[20] Hyun Myung,et al. Depth-hybrid speeded-up robust features (DH-SURF) for real-time RGB-D SLAM , 2018 .

[21] Tian Xia,et al. Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[22] Yong Liu,et al. Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Hyun Myung,et al. RGB-D and Magnetic Sequence-based Graph SLAM with Kidnap Recovery , 2018, 2018 18th International Conference on Control, Automation and Systems (ICCAS).

[25] Hugh Durrant-Whyte,et al. Simultaneous localization and mapping (SLAM): part II , 2006 .

[26] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Christos Papachristos,et al. Vision-Depth Landmarks and Inertial Fusion for Navigation in Degraded Visual Environments , 2018, ISVC.

[28] Hyun Myung,et al. GP-ICP: Ground Plane ICP for Mobile Robots , 2019, IEEE Access.

[29] Chunhua Shen,et al. Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[30] Robert B. Fisher,et al. A low-cost range finder using a visually located, structured light source , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[31] Andrea Cherubini,et al. Autonomous Visual Navigation and Laser-Based Moving Obstacle Avoidance , 2014, IEEE Transactions on Intelligent Transportation Systems.