Feature-Based and Convolutional Neural Network Fusion Method for Visual Relocalization

Relocalization is one of the necessary modules for mobile robots in long-term autonomous movement in an environment. Currently, visual relocalization algorithms mainly include feature-based methods and CNN-based (Convolutional Neural Network) methods. Feature-based methods can achieve high localization accuracy in feature-rich scenes, but the error is quite large or it even fails in cases with motion blur, texture-less scene and changing view angle. CNN-based methods usually have better robustness but poor localization accuracy. For this reason, a visual relocalization algorithm that combines the advantages of the two methods is proposed in this paper. The BoVW (Bag of Visual Words) model is used to search for the most similar image in the training dataset. PnP (Perspective n Points) and RANSAC (Random Sample Consensus) are employed to estimate an initial pose. Then the number of inliers is utilized as a criterion whether the feature-based method or the CNN-based method is to be leveraged. Compared with a previous CNN-based method, PoseNet, the average position error is reduced by 45.6% and the average orientation error is reduced by 67.4% on Microsoft's 7-Scenes datasets, which verifies the effectiveness of the proposed algorithm.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  M. S. Güzel,et al.  Autonomous Vehicle Navigation Using Vision and Mapless Strategies: A Survey , 2013 .

[3]  Mark Billinghurst,et al.  A Survey of Augmented Reality , 2015, Found. Trends Hum. Comput. Interact..

[4]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR 2011.

[5]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Mehmet Serdar Guzel Autonomous Vehicle Navigation Using Vision and Mapless Strategies: A Survey , 2013 .

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[10]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[11]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[12]  Feng Wu,et al.  3D visual phrases for landmark recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[15]  Sen Wang,et al.  VidLoc: 6-DoF Video-Clip Relocalization , 2017, ArXiv.

[16]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Sumetee kesorn Visual Navigation for Mobile Robots: a Survey , 2012 .

[19]  Juan D. Tardós,et al.  Probabilistic Semi-Dense Mapping from Highly Accurate Feature-Based Monocular SLAM , 2015, Robotics: Science and Systems.

[20]  Zuzana Kukelova,et al.  New Efficient Solution to the Absolute Pose Problem for Camera with Unknown Focal Length and Radial Distortion , 2010, ACCV.

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  Xiaolin Hu,et al.  Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).