DF-SLAM: A Deep-Learning Enhanced Visual SLAM System based on Deep Local Features

As the foundation of driverless vehicle and intelligent robots, Simultaneous Localization and Mapping(SLAM) has attracted much attention these days. However, non-geometric modules of traditional SLAM algorithms are limited by data association tasks and have become a bottleneck preventing the development of SLAM. To deal with such problems, many researchers seek to Deep Learning for help. But most of these studies are limited to virtual datasets or specific environments, and even sacrifice efficiency for accuracy. Thus, they are not practical enough. We propose DF-SLAM system that uses deep local feature descriptors obtained by the neural network as a substitute for traditional hand-made features. Experimental results demonstrate its improvements in efficiency and stability. DF-SLAM outperforms popular traditional SLAM systems in various scenes, including challenging scenes with intense illumination changes. Its versatility and mobility fit well into the need for exploring new environments. Since we adopt a shallow network to extract local descriptors and remain others the same as original SLAM systems, our DF-SLAM can still run in real-time on GPU.

[1]  Alessio Del Bue,et al.  Probabilistic Structure from Motion with Objects (PSfMO) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Anil K. Jain,et al.  On-line signature verification, , 2002, Pattern Recognit..

[3]  Pascal Fua,et al.  LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[4]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[5]  Ian D. Reid Towards semantic visual SLAM , 2014, 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV).

[6]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Weilin Huang,et al.  Local Multi-Grouped Binary Descriptor With Ring-Based Pooling Configuration and Optimization , 2015, IEEE Transactions on Image Processing.

[8]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[12]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[13]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Jörg Stückler,et al.  Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry , 2018, ECCV.

[17]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[19]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yung-Yu Chuang,et al.  DeepCD: Learning Deep Complementary Descriptors for Patch Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Wenqi Wu,et al.  Tightly-Coupled Stereo Visual-Inertial Navigation Using Point and Line Features , 2015, Sensors.

[22]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Bin Fan,et al.  Affine Subspace Representation for Feature Description , 2014, ECCV.

[24]  Xiaolin Hu,et al.  Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  V. Matousek,et al.  Signature verification using ART-2 neural network , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[27]  Nathan Jacobs,et al.  Revisiting IM2GPS in the Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Jiri Matas,et al.  Repeatability Is Not Enough: Learning Affine Regions via Discriminability , 2017, ECCV.

[29]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[30]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[31]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[32]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[34]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[37]  Yan Lu,et al.  Local Descriptors Optimized for Average Precision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[39]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[42]  Hengzhu Liu,et al.  Efficient deep learning for stereo matching with larger image patches , 2017, 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).

[43]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[44]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Pascal Fua,et al.  Receptive Fields Selection for Binary Feature Description , 2014, IEEE Transactions on Image Processing.

[46]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[47]  Dimitrios G. Kottas,et al.  Efficient and consistent vision-aided inertial navigation using line observations , 2013, 2013 IEEE International Conference on Robotics and Automation.

[48]  Tom Duckett,et al.  Image features for visual teach-and-repeat navigation in changing environments , 2017, Robotics Auton. Syst..

[49]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  George J. Pappas,et al.  Semantic Localization Via the Matrix Permanent , 2014, Robotics: Science and Systems.

[52]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  Krystian Mikolajczyk,et al.  BOLD - Binary online learned descriptor for efficient image matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).