Fusing Semantics and Motion State Detection for Robust Visual SLAM

Achieving robust pose tracking and mapping in highly dynamic environments is a major challenge faced by existing visual SLAM (vSLAM) systems. In this paper, we increase the robustness of existing vSLAM by accurately removing moving objects from the scene so that they will not contribute to pose estimation and mapping. Specifically, semantic information is fused with motion states of the scene via a probability framework to enable accurate and robust moving object extraction in order to retain the useful features for pose estimation and mapping. Our work highlights the importance of distinguishing between motion states of potential moving objects for vSLAM in highly dynamic environments. The proposed method can be integrated into existing vSLAM systems to increase their robustness in dynamic environments without incurring much computation cost. We provide extensive experimental results on three well-known datasets to show that the proposed technique outperforms existing vSLAM methods in indoor and outdoor environments, under various scenarios such as crowded scenes.

[1]  Shoudong Huang,et al.  Towards dense moving object segmentation based robust dense RGB-D SLAM in dynamic scenarios , 2014, 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV).

[2]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[3]  David Nistér,et al.  Preemptive RANSAC for live structure and motion estimation , 2005, Machine Vision and Applications.

[4]  Jörg Stückler,et al.  Large-scale direct SLAM with stereo cameras , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[7]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Walterio W. Mayol-Cuevas,et al.  Ninja on a Plane: Automatic Discovery of Physical Planes for Augmented Reality Using Visual SLAM , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[9]  Javier Civera,et al.  S-PTAM: Stereo Parallel Tracking and Mapping , 2017, Robotics Auton. Syst..

[10]  Gilman E. S. Toombes,et al.  Preemptive RANSAC for Live Structure and Motion Estimation , 2003 .

[11]  Qi Wei,et al.  DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  A. Bab-Hadiashar,et al.  An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics , 2015 .

[13]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[15]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[16]  Julius Ziegler,et al.  Sparse scene flow segmentation for moving object detection in urban environments , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[17]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[18]  Bin Dai,et al.  An improved moving target detecting and tracking based on Optical Flow technique and Kalman filter , 2009, 2009 4th International Conference on Computer Science & Education.

[19]  Jörg Stückler,et al.  Motion Cooperation: Smooth Piece-wise Rigid Scene Flow from RGB-D Images , 2015, 2015 International Conference on 3D Vision.

[20]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[21]  Kiyoharu Aizawa,et al.  Mask-SLAM: Robust Feature-Based Monocular SLAM by Masking Using Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[23]  Torsten Sattler,et al.  VSO: Visual Semantic Odometry , 2018, ECCV.

[24]  Paul Newman,et al.  Multimotion Visual Odometry (MVO): Simultaneous Estimation of Camera and Third-Party Motions , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Javier Civera,et al.  DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes , 2018, IEEE Robotics and Automation Letters.

[26]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[27]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Friedrich Fraundorfer,et al.  Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[30]  Ruigang Yang,et al.  The ApolloScape Dataset for Autonomous Driving , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Ziqi Zhang,et al.  Detect-SLAM: Making Object Detection and SLAM Mutually Beneficial , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[34]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.