DDL-SLAM: A Robust RGB-D SLAM in Dynamic Environments Combined With Deep Learning

Visual Simultaneous Localization and Mapping (VSLAM) has developed as the basic ability of robots in past few decades. There are a lot of open-sourced and impressive SLAM systems. However, the majority of the theories and approaches of SLAM systems at present are based on the static scene assumption, which is usually not practical in reality because moving objects are ubiquitous and inevitable under most circumstances. In this paper the DDL-SLAM (Dynamic Deep Learning SLAM) is proposed, a robust RGB-D SLAM system for dynamic scenarios that, based on ORB-SLAM2, adds the abilities of dynamic object segmentation and background inpainting. We are able to detect moving objects utilizing both semantic segmentation and multi-view geometry. Having a static scene map allows inpainting background of the frame which has been obscured by moving objects, therefore the localization accuracy is greatly improved in the dynamic environment. Experiment with a public RGB-D benchmark dataset, the results clarify that DDL-SLAM can significantly enhance the robustness and stability of the RGB-D SLAM system in the highly-dynamic environment.

[1]  Ryosuke Shibasaki,et al.  SLAM in a dynamic large outdoor environment using a laser scanner , 2008, 2008 IEEE International Conference on Robotics and Automation.

[2]  Viorica Patraucean,et al.  gvnn: Neural Network Library for Geometric Computer Vision , 2016, ECCV Workshops.

[3]  Shoudong Huang,et al.  Towards dense moving object segmentation based robust dense RGB-D SLAM in dynamic scenarios , 2014, 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV).

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[6]  Xuanpeng Li,et al.  Semi-Dense 3D Semantic Mapping from Monocular SLAM , 2016, ArXiv.

[7]  Álvaro Sánchez Miralles,et al.  Topological simultaneous localization and mapping: a survey , 2013, Robotica.

[8]  Jong-Hwan Kim,et al.  Effective Background Model-Based RGB-D Dense Visual Odometry in a Dynamic Environment , 2016, IEEE Transactions on Robotics.

[9]  Chaoqun Wang,et al.  Matching-range-constrained real-time loop closure detection with CNNs features , 2016, 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR).

[10]  Luis Montano,et al.  Semantic visual SLAM in populated environments , 2017, 2017 European Conference on Mobile Robots (ECMR).

[11]  Michael Milford,et al.  Meaningful maps with object-oriented semantic mapping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  K. Madhava Krishna,et al.  Motion segmentation of multiple objects from a freely moving monocular camera , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15]  Roland Memisevic,et al.  Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[16]  John J. Leonard,et al.  Dynamic pose graph SLAM: Long-term mapping in low dynamic environments , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Somkiat Wangsiripitak,et al.  Avoiding moving outliers in visual SLAM by tracking moving objects , 2009, 2009 IEEE International Conference on Robotics and Automation.

[18]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[19]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Michael Milford,et al.  Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[21]  Tao Zhang,et al.  Unsupervised learning to detect loops using deep neural networks for visual SLAM system , 2017, Auton. Robots.

[22]  Chieh-Chih Wang,et al.  Stereo-based simultaneous localization, mapping and moving object tracking , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Luis Miguel Bergasa,et al.  On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[24]  Dongheui Lee,et al.  RGB-D SLAM in Dynamic Environments Using Static Point Weighting , 2017, IEEE Robotics and Automation Letters.

[25]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Peter Cheeseman,et al.  On the Representation and Estimation of Spatial Uncertainty , 1986 .

[28]  Yuxiang Sun,et al.  Improving RGB-D SLAM in dynamic environments: A motion removal approach , 2017, Robotics Auton. Syst..

[29]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[30]  Xiaolin Hu,et al.  Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[33]  Peter C. Cheeseman,et al.  Estimating uncertain spatial relationships in robotics , 1986, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[34]  Tuan D. Pham,et al.  DUNet: A deformable network for retinal vessel segmentation , 2018, Knowl. Based Syst..

[35]  Scott Sanner,et al.  Towards object mapping in non-stationary environments with mobile robots , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[37]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38]  Tao Zhang,et al.  Loop closure detection for visual SLAM systems using deep neural networks , 2015, 2015 34th Chinese Control Conference (CCC).

[39]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.