Learning binary features online from motion dynamics for incremental loop-closure detection and place recognition

This paper proposes a simple yet effective approach to learn visual features online for improving loop-closure detection and place recognition, based on bag-of-words frameworks. The approach learns a codeword in the bag-of-words model from a pair of matched features from two consecutive frames, such that the codeword has temporally-derived perspective invariance to camera motion. The learning algorithm is efficient: the binary descriptor is generated from the mean image patch, and the mask is learned based on discriminative projection by minimizing the intra-class distances among the learned feature and the two original features. A codeword is generated by packaging the learned descriptor and mask, with a masked Hamming distance defined to measure the distance between two codewords. The geometric properties of the learned codewords are then mathematically justified. In addition, hypothesis constraints are imposed through temporal consistency in matched codewords, which improves precision. The approach, integrated in an incremental bag-of-words system, is validated on multiple benchmark data sets and compared to state-of-the-art methods. Experiments demonstrate improved precision/recall outperforming state of the art with little loss in runtime.

[1]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[2]  Patricio A. Vela,et al.  Efficient Closed-Loop Detection and Pose Estimation for Vision-Only Relative Localization in Space with a Cooperative Target , 2014 .

[3]  Patricio A. Vela,et al.  Optimally observable and minimal cardinality monocular SLAM , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Stefano Soatto,et al.  Domain-size pooling in local descriptors: DSP-SIFT , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[8]  Ryan M. Eustice,et al.  Ford Campus vision and lidar data set , 2011, Int. J. Robotics Res..

[9]  Vincent Lepetit,et al.  View-based Maps , 2010, Int. J. Robotics Res..

[10]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[11]  Hongping Cai,et al.  Learning Linear Discriminant Projections for Dimensionality Reduction of Image Descriptors , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Patricio A. Vela,et al.  Good features to track for visual SLAM , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[14]  Vincent Lepetit,et al.  Efficient Discriminative Projections for Compact Binary Descriptors , 2012, ECCV.

[15]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[16]  Francisco Angel Moreno,et al.  A collection of outdoor robotic datasets with centimeter-accuracy ground truth , 2009, Auton. Robots.

[17]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[18]  Patricio A. Vela,et al.  Cooperative Relative Navigation for Space Rendezvous and Proximity Operations using Controlled Active Vision , 2016, J. Field Robotics.

[19]  Dirk Wollherr,et al.  IBuILD: Incremental bag of Binary words for appearance based loop closure detection , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[21]  Winston Churchill,et al.  The New College Vision and Laser Data Set , 2009, Int. J. Robotics Res..

[22]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[24]  Panagiotis Tsiotras,et al.  Robust Feature Detection, Acquisition and Tracking for Relative Navigation in Space with a Known Target , 2013 .

[25]  Krystian Mikolajczyk,et al.  BOLD - Binary online learned descriptor for efficient image matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[27]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Matthew A. Brown,et al.  Picking the best DAISY , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.