Improved Deep Distance Learning for Visual Loop Closure Detection in Smart City

Visual Simultaneous Localization and Mapping (vSLAM) are expected to promote the initiatives in Smart City including driverless cars and intelligent robots. Loop closure detection (LCD) is an important module in a vSLAM system. Existing works with convolutional neural networks exhibit better performance on feature extraction, but this is far from enough. Concerning the characteristics of LCD, it is of great significance to have a customized loss function and a method to construct suitable training image sets. Based on this motivation, we propose a novel framework for LCD. Through a deep analysis of the distance relationships in the LCD problem, we propose the multi-tuplet clusters loss function together with mini-batch construction scheme. The proposed framework can map images to a low dimensional space and extract more discriminative image features, which help learn a more essential distance relationship of the LCD problem. Extensive evaluations demonstrate that our method outperforms many state-of-art approaches even in complex environments with strong appearance changes. Importantly, though the training process is computationally demanding, its online application is very efficient.

[1]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[2]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Shilin Zhou,et al.  Convolutional neural network-based image representation for visual loop closure detection , 2015, 2015 IEEE International Conference on Information and Automation.

[4]  Yannis Avrithis,et al.  Panorama to Panorama Matching for Location Recognition , 2017, ICMR.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Klaus D. McDonald-Maier,et al.  Levelling the Playing Field: A Comprehensive Comparison of Visual Place Recognition Approaches under Changing Conditions , 2019, ArXiv.

[7]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[8]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[9]  Zhenbao Liu,et al.  Place recognition based on deep feature and adaptive weighting of similarity matrix , 2016, Neurocomputing.

[10]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[11]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[12]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Pavel Zemcík,et al.  Incremental Block Cholesky Factorization for Nonlinear Least Squares in Robotics , 2013, Robotics: Science and Systems.

[14]  Lei Zhang,et al.  A Survey on Visual Place Recognition for Mobile Robots Localization , 2017, 2017 14th Web Information Systems and Applications Conference (WISA).

[15]  H. Brittain,et al.  Optical activity in Tb(III) mixed-ligand complexes containing pyridine-2,6-dicarboxylic acid and hydroxyphenyl derivatives , 1983 .

[16]  Viorela Ila,et al.  Incremental Cholesky Factorization for Least Squares Problems in Robotics , 2013 .

[17]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[19]  Guoquan Huang,et al.  Lightweight Unsupervised Deep Loop Closure , 2018, Robotics: Science and Systems.

[20]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[24]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[25]  Javier González,et al.  Appearance-invariant place recognition by discriminatively training a convolutional neural network , 2017, Pattern Recognit. Lett..

[26]  Dorian Gálvez-López,et al.  Real-time loop detection with bags of binary words , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[28]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[29]  Tao Zhang,et al.  Unsupervised learning to detect loops using deep neural networks for visual SLAM system , 2017, Auton. Robots.