ASD-SLAM: A Novel Adaptive-Scale Descriptor Learning for Visual SLAM

Visual Odometry and Simultaneous Localization and Mapping (SLAM) are widely used in autonomous driving. In the traditional keypoint-based visual SLAM systems, the feature matching accuracy of the front end plays a decisive role and becomes the bottleneck restricting the positioning accuracy, especially in challenging scenarios like viewpoint variation and highly repetitive scenes. Thus, increasing the discriminability and matchability of feature descriptor is of importance to improve the positioning accuracy of visual SLAM. In this paper, we proposed a novel adaptive-scale triplet loss function and apply it to triplet network to generate adaptive-scale descriptor (ASD). Based on ASD, we designed our monocular SLAM system (ASD-SLAM) which is an deep-learning enhanced system based on the state of art ORB-SLAM system. The experimental results show that ASD achieves better performance on the UBC benchmark dataset, at the same time, the ASD-SLAM system also outperforms the current popular visual SLAM frameworks on the KITTI Odometry Dataset.

[1]  Nikolas Brasch,et al.  Semantic Monocular SLAM for Highly Dynamic Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Daniel Cremers,et al.  LDSO: Direct Sparse Odometry with Loop Closure , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[4]  W. Marsden I and J , 2012 .

[5]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[9]  Roland Siegwart,et al.  Efficient descriptor learning for large scale localization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Qi Wei,et al.  DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Pascal Fua,et al.  LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[13]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[14]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Krystian Mikolajczyk,et al.  PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors , 2016, ArXiv.

[17]  Margarita Chli,et al.  Learning Deep Descriptors with Scale-Aware Triplet Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  Li Sun,et al.  Learning Monocular Visual Odometry with Dense 3D Mapping from Dense 3D Flow , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[21]  Davide Scaramuzza,et al.  A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[23]  Victor Lempitsky,et al.  Learnable Line Segment Descriptor for Visual SLAM , 2019, IEEE Access.

[24]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[26]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[29]  Wolfram Burgard,et al.  Semantics-aware visual localization under challenging perceptual conditions , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[32]  Pengpeng Zhao,et al.  A Comparative Study of SIFT and its Variants , 2013 .

[33]  Weinan Chen,et al.  A Comparison of CNN-Based and Hand-Crafted Keypoint Descriptors , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[34]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[35]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[36]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Weilin Huang,et al.  Local Multi-Grouped Binary Descriptor With Ring-Based Pooling Configuration and Optimization , 2015, IEEE Transactions on Image Processing.