Monocular Visual Odometry using Learned Repeatability and Description

Robustness and accuracy for monocular visual odometry (VO) under challenging environments are widely concerned. In this paper, we present a monocular VO system leveraging learned repeatability and description. In a hybrid scheme, the camera pose is initially tracked on the predicted repeatability maps in a direct manner and then refined with the patch-wise 3D-2D association. The local feature parameterization and the adapted mapping module further boost different functionalities in the system. Extensive evaluations on challenging public datasets are performed. The competitive performance on camera pose estimation demonstrates the effectiveness of our method. Additional studies on the local reconstruction accuracy and running time exhibit that our system is capable of maintaining a robust and lightweight backend.

[1]  Daniel Cremers,et al.  GN-Net: The Gauss-Newton Loss for Multi-Weather Relocalization , 2020, IEEE Robotics and Automation Letters.

[2]  Roland Siegwart,et al.  The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[3]  Davide Scaramuzza,et al.  Active exposure control for robust visual odometry in HDR environments , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Lourdes Agapito,et al.  MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[5]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Shichao Yang,et al.  CubeSLAM: Monocular 3-D Object SLAM , 2018, IEEE Transactions on Robotics.

[9]  John Folkesson,et al.  GCNv2: Efficient Correspondence Prediction for Real-Time SLAM , 2019, IEEE Robotics and Automation Letters.

[10]  Kurt Konolige,et al.  Double window optimisation for constant time visual SLAM , 2011, 2011 International Conference on Computer Vision.

[11]  Gabriela Csurka,et al.  R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[12]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[13]  Tomasz Malisiewicz,et al.  Self-Improving Visual Odometry , 2018, ArXiv.

[14]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[15]  Daniel Cremers,et al.  Challenges in Monocular Visual Odometry: Photometric Calibration, Motion Bias, and Rolling Shutter Effect , 2017, IEEE Robotics and Automation Letters.

[16]  Marc Pollefeys,et al.  Illumination change robustness in direct visual SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[18]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Brett Browning,et al.  Direct Visual Odometry in Low Light Using Binary Descriptors , 2017, IEEE Robotics and Automation Letters.

[20]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[21]  Roland Siegwart,et al.  C-blox: A Scalable and Consistent TSDF-based Dense Mapping Approach , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[23]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Syamsiah Mashohor,et al.  CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Yuxiang Sun,et al.  Motion removal for reliable RGB-D SLAM in dynamic environments , 2018, Robotics Auton. Syst..

[27]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Ping Tan,et al.  BA-Net: Dense Bundle Adjustment Network , 2018, ICLR 2018.

[30]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32]  H. Jin Kim,et al.  Robust visual localization in changing lighting conditions , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[34]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[36]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[37]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[38]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[39]  Ping Tan,et al.  BA-Net: Dense Bundle Adjustment Network , 2018, ICLR.

[40]  Roland Siegwart,et al.  From Coarse to Fine: Robust Hierarchical Localization at Large Scale , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[42]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[43]  Fukui Kazuhiro,et al.  Realistic CG Stereo Image Dataset With Ground Truth Disparity Maps , 2012 .

[44]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).