Unsupervised learning to detect loops using deep neural networks for visual SLAM system

This paper is concerned of the loop closure detection problem for visual simultaneous localization and mapping systems. We propose a novel approach based on the stacked denoising auto-encoder (SDA), a multi-layer neural network that autonomously learns an compressed representation from the raw input data in an unsupervised way. Different with the traditional bag-of-words based methods, the deep network has the ability to learn the complex inner structures in image data, while no longer needs to manually design the visual features. Our approach employs the characteristics of the SDA to solve the loop detection problem. The workflow of training the network, utilizing the features and computing the similarity score is presented. The performance of SDA is evaluated by a comparison study with Fab-map 2.0 using data from open datasets and physical robots. The results show that SDA is feasible for detecting loops at a satisfactory precision and can therefore provide an alternative way for visual SLAM systems.

[1]  Diego Rodríguez-Losada,et al.  Feature based graph-SLAM in structured environments , 2014, Auton. Robots.

[2]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[6]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[7]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[8]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[9]  Tao Zhang,et al.  Loop closure detection for visual SLAM systems using deep neural networks , 2015, 2015 34th Chinese Control Conference (CCC).

[10]  Dieter Fox,et al.  Learning hierarchical sparse features for RGB-(D) object recognition , 2014, Int. J. Robotics Res..

[11]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[12]  Paul Newman,et al.  Detecting Loop Closure with Scene Sequences , 2007, International Journal of Computer Vision.

[13]  Tao Zhang,et al.  Robust RGB-D simultaneous localization and mapping using planar point features , 2015, Robotics Auton. Syst..

[14]  Óscar Martínez Mozos,et al.  A comparative evaluation of interest point detectors and local descriptors for visual SLAM , 2010, Machine Vision and Applications.

[15]  Álvaro Sánchez Miralles,et al.  Topological simultaneous localization and mapping: a survey , 2013, Robotica.

[16]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[17]  Kurt Konolige,et al.  FrameSLAM: From Bundle Adjustment to Real-Time Visual Mapping , 2008, IEEE Transactions on Robotics.

[18]  Gregory Dudek,et al.  Robust place recognition using local appearance based methods , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[19]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[20]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[21]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[22]  Avinash C. Kak,et al.  Building 3D visual maps of interior space with a new hierarchical sensor fusion architecture , 2013, Robotics Auton. Syst..

[23]  Haizhou Li,et al.  RGB-D based cognitive map building and navigation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[25]  Yasir Latif,et al.  Robust loop closing over time for pose graph SLAM , 2013, Int. J. Robotics Res..

[26]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  José García Rodríguez,et al.  A Comparative Study of Registration Methods for RGB-D Video of Static Scenes , 2014, Sensors.

[28]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[29]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[30]  Xianliang Wu,et al.  Feature selection for reliable data association in visual SLAM , 2012, Machine Vision and Applications.

[31]  F. Michaud,et al.  Appearance-Based Loop Closure Detection for Online Large-Scale and Long-Term Operation , 2013, IEEE Transactions on Robotics.

[32]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Benjamin Kuipers,et al.  Factoring the Mapping Problem: Mobile Robot Map-building in the Hybrid Spatial Semantic Hierarchy , 2010, Int. J. Robotics Res..

[34]  Kurt Konolige,et al.  CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[35]  Antonios Gasteratos,et al.  Learning spatially semantic representations for cognitive robot navigation , 2013, Robotics Auton. Syst..

[36]  Yin-Tien Wang,et al.  Improvement of speeded-up robust features for robot visual simultaneous localization and mapping , 2014, Robotica.

[37]  Yoshua Bengio,et al.  Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[38]  Hauke Strasdat,et al.  Visual SLAM: Why filter? , 2012, Image Vis. Comput..

[39]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[40]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Cheng-Yuan Liou,et al.  Autoencoder for words , 2014, Neurocomputing.

[42]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[43]  Illah R. Nourbakhsh,et al.  Appearance-based place recognition for topological localization , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[44]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[45]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[46]  Ian D. Reid,et al.  Automatic Relocalization and Loop Closing for Real-Time Monocular SLAM , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[48]  Dorian Gálvez-López,et al.  Robust Place Recognition With Stereo Sequences , 2012, IEEE Transactions on Robotics.

[49]  Jörg Stückler,et al.  Multi-resolution surfel maps for efficient dense 3D modeling and tracking , 2014, J. Vis. Commun. Image Represent..

[50]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[51]  Jagath Samarabandu,et al.  Recent advances in simultaneous localization and map-building using computer vision , 2007, Adv. Robotics.

[52]  Wolfram Burgard,et al.  An efficient fastSLAM algorithm for generating maps of large-scale cyclic environments from raw laser range measurements , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[53]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .