论文信息 - A Front-End for Dense Monocular SLAM using a Learned Outlier Mask Prior

A Front-End for Dense Monocular SLAM using a Learned Outlier Mask Prior

Recent achievements in depth prediction from a single RGB image have powered the new research area of combining convolutional neural networks (CNNs) with classical simultaneous localization and mapping (SLAM) algorithms. The depth prediction from a CNN provides a reasonable initial point in the optimization process in the traditional SLAM algorithms, while the SLAM algorithms further improve the CNN prediction online. However, most of the current CNN-SLAM approaches have only taken advantage of the depth prediction but not yet other products from a CNN. In this work, we explore the use of the outlier mask, a by-product from unsupervised learning of depth from video, as a prior in a classical probability model for depth estimate fusion to step up the outlier-resistant tracking performance of a SLAM front-end. On the other hand, some of the previous CNN-SLAM work builds on feature-based sparse SLAM methods, wasting the per-pixel dense prediction from a CNN. In contrast to these sparse methods, we devise a dense CNN-assisted SLAM frontend that is implementable with TensorFlow and evaluate it on both indoor and outdoor datasets.

Yihao Zhang | John J. Leonard | J. Leonard | Yihao Zhang

[1] Chunhua Shen,et al. Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video , 2019, NeurIPS.

[2] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[3] Davide Scaramuzza,et al. REMODE: Probabilistic, monocular dense reconstruction in real time , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[4] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Nan Yang,et al. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[9] Stefan Leutenegger,et al. SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10] Javier Civera,et al. Inverse Depth Parametrization for Monocular SLAM , 2008, IEEE Transactions on Robotics.

[11] Juan D. Tardós,et al. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[12] Andrew J. Davison,et al. DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[13] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[14] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[16] Federico Tombari,et al. CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Daniel Cremers,et al. Semi-dense Visual Odometry for a Monocular Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[18] Anelia Angelova,et al. Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Carlos Hernández,et al. Video-based, real-time multi-view stereo , 2011, Image Vis. Comput..

[20] Andrew J. Davison,et al. DeepFactors: Real-Time Probabilistic Dense Monocular SLAM , 2020, IEEE Robotics and Automation Letters.

[21] Daniel Cremers,et al. LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[22] Yihao Zhang,et al. Bootstrapped Self-Supervised Training with Monocular Video for Semantic Segmentation and Depth Estimation , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Javier Civera,et al. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes , 2018, IEEE Robotics and Automation Letters.

[25] Jörg Stückler,et al. Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry , 2018, ECCV.

[26] Wolfram Burgard,et al. A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27] John J. Leonard,et al. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[28] Stefan Leutenegger,et al. SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[30] Gabriel J. Brostow,et al. Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] G. Klein,et al. Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[32] Daniel Cremers,et al. Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Syamsiah Mashohor,et al. CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[34] Stefan Leutenegger,et al. CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Davide Scaramuzza,et al. SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).