论文信息 - ORB-SLAM-CNN: Lessons in Adding Semantic Map Construction to Feature-Based SLAM

ORB-SLAM-CNN: Lessons in Adding Semantic Map Construction to Feature-Based SLAM

Recent work has integrated semantics into the 3D scene models produced by visual SLAM systems. Though these systems operate close to real time, there is lacking a study of the ways to achieve real-time performance by trading off between semantic model accuracy and computational requirements. ORB-SLAM2 provides good scene accuracy and real-time processing while not requiring GPUs [1]. Following a ‘single view’ approach of overlaying a dense semantic map over the sparse SLAM scene model, we explore a method for automatically tuning the parameters of the system such that it operates in real time while maximizing prediction accuracy and map density.

Gavin Brown | Andrew M. Webb | Mikel Luján

[1] Juan D. Tardós,et al. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[2] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[3] John J. Leonard,et al. Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[4] Michael F. P. O'Boyle,et al. SLAMBench2: Multi-Objective Head-to-Head Benchmarking for Visual SLAM , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8] Stefan Leutenegger,et al. ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[9] Gordon Wyeth,et al. Place categorization and semantic mapping on a mobile robot , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[10] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[11] Patrick Pérez,et al. Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12] Xuanpeng Li,et al. Semi-Dense 3D Semantic Mapping from Monocular SLAM , 2016, ArXiv.

[13] Gary R. Bradski,et al. ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[14] James M. Rehg,et al. Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[15] Stefan Leutenegger,et al. SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[16] Marc Pollefeys,et al. Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.