Visual place recognition using HMM sequence matching

Visual place recognition and loop closure is critical for the global accuracy of visual Simultaneous Localization and Mapping (SLAM) systems. We present a place recognition algorithm which operates by matching local query image sequences to a database of image sequences. To match sequences, we calculate a matrix of low-resolution, contrast-enhanced image similarity probability values. The optimal sequence alignment, which can be viewed as a discontinuous path through the matrix, is found using a Hidden Markov Model (HMM) framework reminiscent of Dynamic Time Warping from speech recognition. The state transitions enforce local velocity constraints and the most likely path sequence is recovered efficiently using the Viterbi algorithm. A rank reduction on the similarity probability matrix is used to provide additional robustness in challenging conditions when scoring sequence matches. We evaluate our approach on seven outdoor vision datasets and show improved precision-recall performance against the recently published seqSLAM algorithm.

[1]  Michael Milford,et al.  Vision-based place recognition: how low can you go? , 2013, Int. J. Robotics Res..

[2]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[3]  Francisco Angel Moreno,et al.  The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario , 2014, Int. J. Robotics Res..

[4]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[6]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Michael Milford,et al.  Towards persistent visual navigation using SMART , 2013, ICRA 2013.

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Brett Browning,et al.  Orientation only loop-closing with closed-form trajectory bending , 2012, 2012 IEEE International Conference on Robotics and Automation.

[11]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[12]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Gordon Wyeth,et al.  FAB-MAP + RatSLAM: Appearance-based SLAM for multiple times of day , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15]  Paul Newman,et al.  Detecting Loop Closure with Scene Sequences , 2007, International Journal of Computer Vision.

[16]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..