Semantic SLAM Based on Object Detection and Improved Octomap

Due to the development of the computer vision, machine learning, and deep learning technologies, the research community focuses not only on the traditional SLAM problems, such as geometric mapping and localization, but also on semantic SLAM. In this paper, we propose a Semantic SLAM system which builds the semantic maps with object-level entities, and it is integrated into the RGB-D SLAM framework. The system combines object detection module that is realized by the deep-learning method, and localization module with RGB-D SLAM seamlessly. In the proposed system, object detection module is used to perform object detection and recognition, and localization module is utilized to get the exact location of the camera. The two modules are integrated together to obtain the semantic maps of the environment. Furthermore, to improve the computational efficiency of the framework, an improved Octomap based on the Fast Line Rasterization Algorithm is constructed. Meanwhile, for the sake of accuracy and robustness of the semantic map, conditional random field is employed to do the optimization. Finally, we evaluate our Semantic SLAM through three different tasks, i.e., localization, object detection, and mapping. Specifically, the accuracy of localization and the mapping speed is evaluated on TUM data set. Compared with ORB-SLAM2 and original RGB-D SLAM, our system, respectively, got 72.9% and 91.2% improvements in dynamic environments localization evaluated by root-mean-square error. With the improved Octomap, the proposed Semantic SLAM is 66.5% faster than the original RGB-D SLAM. We also demonstrate the efficiency of object detection through quantitative evaluation in an automated inventory management task on a real-world data sets recorded over a realistic office.

[1]  K. Madhava Krishna,et al.  Semantic Motion Segmentation Using Dense CRF Formulation , 2014, ICVGIP.

[2]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Wei Wei,et al.  Gradient-driven parking navigation using a continuous information potential field based on wireless sensor network , 2017, Inf. Sci..

[5]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[6]  Houbing Song,et al.  Imperfect Information Dynamic Stackelberg Game Based Resource Allocation Using Hidden Markov for Cloud Computing , 2018, IEEE Transactions on Services Computing.

[7]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Wei Wei,et al.  Research and Simulation of Queue Management Algorithms in Ad Hoc Networks Under DDoS Attack , 2017, IEEE Access.

[9]  Osamu Hasegawa,et al.  Random Field Model for Integration of Local Information and Global Information , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[11]  Hari M. Srivastava,et al.  A Local Fractional Integral Inequality on Fractal Space Analogous to Anderson’s Inequality , 2014 .

[12]  Roy Corrado Anati Semantic localization and mapping in robot vision , 2016 .

[13]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[14]  Wolfgang Hess,et al.  Real-time loop closure in 2D LIDAR SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[16]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[17]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[18]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[20]  L. Wang,et al.  GI/Geom/1 queue based on communication model for mesh networks , 2014, Int. J. Commun. Syst..

[21]  Wolfram Burgard,et al.  Multi-Level Surface Maps for Outdoor Terrain Mapping and Loop Closing , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Wolfram Burgard,et al.  Improved Techniques for Grid Mapping With Rao-Blackwellized Particle Filters , 2007, IEEE Transactions on Robotics.

[23]  Yong Qi,et al.  Information Potential Fields Navigation in Wireless Ad-Hoc Sensor Networks , 2011, Sensors.

[24]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[25]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jing Zhang,et al.  A Bijection between Lattice-Valued Filters and Lattice-Valued Congruences in Residuated Lattices , 2013 .

[28]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Gang Liu,et al.  Towards Cost-Effective Cloud Downloading with Tencent Big Data , 2015, Journal of Computer Science and Technology.

[32]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[33]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[34]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Peiyi Shen,et al.  Combined Energy Minimization for Image Reconstruction from Few Views , 2012 .

[36]  Zhong Shao,et al.  A Fast Line Rasterization Algorithm Based on Pattern Decomposition: A Fast Line Rasterization Algorithm Based on Pattern Decomposition , 2010 .

[37]  Xiaochun Cheng,et al.  Numeric characteristics of generalized M-set with its asymptote , 2014, Appl. Math. Comput..

[38]  Surya P. N. Singh,et al.  Hybrid elevation maps: 3D surface models for segmentation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[40]  Wei Wei,et al.  Video tamper detection based on multi-scale mutual information , 2019, Multimedia Tools and Applications.

[41]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[42]  Bin Zhou,et al.  Holes Detection in Anisotropic Sensornets: Topological Methods , 2012, Int. J. Distributed Sens. Networks.

[43]  Dorian Gálvez-López,et al.  Real-time Monocular Object SLAM , 2015, Robotics Auton. Syst..

[44]  Nassir Navab,et al.  When 2.5D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[48]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[49]  Wei Wei,et al.  Energy Balance-Based Steerable Arguments Coverage Method in WSNs , 2018, IEEE Access.

[50]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Wolfram Burgard,et al.  An evaluation of the RGB-D SLAM system , 2012, 2012 IEEE International Conference on Robotics and Automation.

[52]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.