Fast ORB-SLAM without Keypoint Descriptors.

Indirect methods for visual SLAM are gaining popularity due to their robustness to environmental variations. ORB-SLAM2 [1] is a benchmark method in this domain, however, it consumes significant time for computing descriptors that never get reused unless a frame is selected as a keyframe. To overcome these problems, we present FastORB-SLAM which is light-weight and efficient as it tracks keypoints between adjacent frames without computing descriptors. To achieve this, a two stage descriptor-independent keypoint matching method is proposed based on sparse optical flow. In the first stage, we predict initial keypoint correspondences via a simple but effective motion model and then robustly establish the correspondences via pyramid-based sparse optical flow tracking. In the second stage, we leverage the constraints of the motion smoothness and epipolar geometry to refine the correspondences. In particular, our method computes descriptors only for keyframes. We test FastORB-SLAM on TUM and ICL-NUIM RGB-D datasets and compare its accuracy and efficiency to nine existing RGB-D SLAM methods. Qualitative and quantitative results show that our method achieves state-of-the-art accuracy and is about twice as fast as the ORB-SLAM2.

[1]  Stefan Leutenegger,et al.  Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[2]  Torsten Sattler,et al.  BAD SLAM: Bundle Adjusted Direct RGB-D SLAM , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Dinesh Atchuthan,et al.  A micro Lie theory for state estimation in robotics , 2018, ArXiv.

[4]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Ken Sakurada,et al.  OpenVSLAM: A Versatile Visual SLAM Framework , 2019, ACM Multimedia.

[6]  Javier Civera,et al.  DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes , 2018, IEEE Robotics and Automation Letters.

[7]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[8]  Shilin Zhou,et al.  BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition , 2018, Auton. Robots.

[9]  Hongmin Liu,et al.  Deep Unsupervised Binary Descriptor Learning Through Locality Consistency and Self Distinctiveness , 2021, IEEE Transactions on Multimedia.

[10]  Yasuyuki Matsushita,et al.  GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hironobu Fujiyoshi,et al.  Coarse-to-Fine Deep Orientation Estimator for Local Image Matching , 2019, ACPR.

[12]  Kurt Konolige,et al.  Double window optimisation for constant time visual SLAM , 2011, 2011 International Conference on Computer Vision.

[13]  Jörg Stückler,et al.  Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry , 2018, ECCV.

[14]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Paul Newman,et al.  NID-SLAM: Robust Monocular SLAM Using Normalised Information Distance , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Qianqian Zhang,et al.  Features Combined Binary Descriptor Based on Voted Ring-Sampling Pattern , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Yangang Wang,et al.  SRHandNet: Real-Time 2D Hand Pose Estimation With Simultaneous Region Localization , 2019, IEEE Transactions on Image Processing.

[20]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[21]  Nan Yang,et al.  D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jianping Shi,et al.  CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Jörg Stückler,et al.  Visual-Inertial Mapping With Non-Linear Factor Recovery , 2019, IEEE Robotics and Automation Letters.

[24]  John J. Leonard,et al.  Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[25]  Eijiro Takeuchi,et al.  Monocular Vision-Based Localization Using ORB-SLAM with LIDAR-Aided Mapping in Real-World Robot Challenge , 2016, J. Robotics Mechatronics.

[26]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[27]  Gavin Brown,et al.  ORB-SLAM-CNN: Lessons in Adding Semantic Map Construction to Feature-Based SLAM , 2019, TAROS.

[28]  Weinan Chen,et al.  A Comparison of CNN-Based and Hand-Crafted Keypoint Descriptors , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[29]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[31]  Xiaoou Tang,et al.  LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Daniel Cremers,et al.  Direct Sparse Visual-Inertial Odometry Using Dynamic Marginalization , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Yipu Zhao,et al.  Low-latency Visual SLAM with Appearance-Enhanced Local Map Building , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[34]  Jiyu Cheng,et al.  Improving monocular visual SLAM in dynamic environments: an optical-flow-based approach , 2019, Adv. Robotics.

[35]  Wei Sun,et al.  A Robust RGB-D SLAM System With Points and Lines for Low Texture Indoor Environments , 2019, IEEE Sensors Journal.

[36]  Stefan Leutenegger,et al.  ElasticFusion: Real-time dense SLAM and light source estimation , 2016, Int. J. Robotics Res..

[37]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jiwen Lu,et al.  Unsupervised Deep Learning of Compact Binary Descriptors , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Daniel Cremers,et al.  Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Yoshihiko Nakamura,et al.  FlowFusion: Dynamic Dense RGB-D SLAM Based on Optical Flow , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[41]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[42]  Jörg Stückler,et al.  Direct Sparse Odometry with Rolling Shutter , 2018, ECCV.

[43]  Mingui Sun,et al.  Robust Robot Pose Estimation for Challenging Scenes With an RGB-D Camera , 2019, IEEE Sensors Journal.

[44]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[45]  Evgeni Magid,et al.  Comparative analysis of ROS-based monocular SLAM methods for indoor navigation , 2017, International Conference on Machine Vision.

[46]  Javier Gonzalez-Jimenez,et al.  PL-SLAM: A Stereo SLAM System Through the Combination of Points and Line Segments , 2017, IEEE Transactions on Robotics.

[47]  Nanning Zheng,et al.  Visual Semantic SLAM with Landmarks for Large-Scale Outdoor Environment , 2020, ArXiv.

[48]  Gang Xu,et al.  Epipolar Geometry in Stereo, Motion and Object Recognition , 1996, Computational Imaging and Vision.

[49]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[50]  Syamsiah Mashohor,et al.  CNN-SVO: Improving the Mapping in Semi-Direct Visual Odometry Using Single-Image Depth Prediction , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[51]  Jonathan Krynitsky,et al.  Three-Dimensional Pose Estimation for Laboratory Mouse From Monocular Images , 2019, IEEE Transactions on Image Processing.

[52]  Pascal Fua,et al.  A Performance Evaluation of Local Features for Image-Based 3D Reconstruction , 2017, IEEE Transactions on Image Processing.

[53]  Jiwen Lu,et al.  Efficient nearest neighbor search in high dimensional hamming space , 2020, Pattern Recognit..

[54]  Jiwen Lu,et al.  Learning Deep Binary Descriptor with Multi-Quantization , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Richard Elvira,et al.  ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM , 2021, IEEE Transactions on Robotics.

[56]  Qi Wei,et al.  DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[57]  Wei Yang,et al.  DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[58]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[59]  J. Alex Stark,et al.  Adaptive image contrast enhancement using generalizations of histogram equalization , 2000, IEEE Trans. Image Process..

[60]  Shaojie Shen,et al.  A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors , 2019, ArXiv.

[61]  Lu Fang,et al.  Real-Time Global Registration for Globally Consistent RGB-D SLAM , 2019, IEEE Transactions on Robotics.

[62]  Daniel Cremers,et al.  Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[63]  Michael R. Lyu,et al.  SelFlow: Self-Supervised Learning of Optical Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Michael Gassner,et al.  SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems , 2017, IEEE Transactions on Robotics.

[65]  Tao Zhang,et al.  Vision-Based Pose Estimation From Points With Unknown Correspondences , 2014, IEEE Transactions on Image Processing.

[66]  Chunlong He,et al.  Kalman-Filter-Based Integration of IMU and UWB for High-Accuracy Indoor Positioning and Navigation , 2020, IEEE Internet of Things Journal.

[67]  Hong Zhang,et al.  Combining Multiple Image Descriptions for Loop Closure Detection , 2018, J. Intell. Robotic Syst..

[68]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[69]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[70]  Daniel Cremers,et al.  LDSO: Direct Sparse Odometry with Loop Closure , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[71]  Hak-Keung Lam,et al.  Joint optimization based on direct sparse stereo visual-inertial odometry , 2020, Auton. Robots.