Enhancing two-view correspondence learning by local-global self-attention

Abstract Seeking reliable correspondences is a fundamental and significant work in computer vision. Recent work has demonstrated that the task can be effectively accomplished by utilizing a deep learning network based on multi-layer perceptrons, which uses the context normalization to deal with the input. However, the context normalization treats each correspondence equally, which will reduce the representation capability of potential inliers. To solve this problem, we propose a novel and effective Local-Global Self-Attention (LAGA) layer based on the self-attention mechanism, to capture contextual information of potential inliers from coarse to fine, and suppress outliers at the same time in processing the input. The global self-attention module is able to capture abundant global contextual information in the whole image, and the local self-attention module is used to obtain rich local contextual information in the local region. After that, to obtain richer contextual information and feature maps with stronger representative capacity, we combine global and local contextual information. The extensive experiments have shown that the networks with our proposed LAGA layer perform better than the original and other comparative networks in outdoor and indoor scenes for outlier removal and camera pose estimation tasks.

[1]  Yasuyuki Matsushita,et al.  GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Junjun Jiang,et al.  Locality Preserving Matching , 2018, International Journal of Computer Vision.

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Matthew Turk,et al.  EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Yanping Li,et al.  Efficient Properties-Based Learning for Mismatch Removal , 2019, IEEE Access.

[7]  Jiayi Ma,et al.  A review of multimodal image matching: Methods and applications , 2021, Inf. Fusion.

[8]  Jiayi Ma,et al.  Cross-Weather Image Alignment via Latent Generative Model With Intensity Consistency , 2020, IEEE Transactions on Image Processing.

[9]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[10]  Alan L. Yuille,et al.  Non-Rigid Point Set Registration by Preserving Global and Local Structures , 2016, IEEE Transactions on Image Processing.

[11]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[12]  Junjun Jiang,et al.  LMR: Learning a Two-Class Classifier for Mismatch Removal , 2019, IEEE Transactions on Image Processing.

[13]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[14]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[15]  Zhuowen Tu,et al.  Robust Point Matching via Vector Field Consensus , 2014, IEEE Transactions on Image Processing.

[16]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[17]  Marcin Woźniak,et al.  MobileGCN applied to low-dimensional node feature learning , 2021, Pattern Recognit..

[18]  Junjun Jiang,et al.  Image Matching from Handcrafted to Deep Features: A Survey , 2020, International Journal of Computer Vision.

[19]  Vincent Lepetit,et al.  Learning to Find Good Correspondences , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Andriy Myronenko,et al.  Point Set Registration: Coherent Point Drift , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Junjun Jiang,et al.  Feature-guided Gaussian mixture model for image matching , 2019, Pattern Recognit..

[22]  Slawomir J. Nasuto,et al.  NAPSAC: High Noise, High Dimensional Robust Estimation - it's in the Bag , 2002, BMVC.

[23]  Jiri Matas,et al.  MAGSAC: Marginalizing Sample Consensus , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Hao Li,et al.  An Efficient Image Matching Algorithm Based on Adaptive Threshold and RANSAC , 2018, IEEE Access.

[25]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[26]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Vladlen Koltun,et al.  Deep Fundamental Matrix Estimation , 2018, ECCV.

[28]  Jan-Michael Frahm,et al.  USAC: A Universal Framework for Random Sample Consensus , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yang Wang,et al.  Geometric Estimation via Robust Subspace Recovery , 2020, ECCV.

[30]  Lifang Wei,et al.  Robust feature matching via advanced neighborhood topology consensus , 2021, Neurocomputing.

[31]  Jun Huang,et al.  Learning to find reliable correspondences with local neighborhood consensus , 2020, Neurocomputing.

[32]  Riqing Chen,et al.  Motion Consistency-Based Correspondence Growing for Remote Sensing Image Matching , 2021, IEEE Geoscience and Remote Sensing Letters.

[33]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[34]  Jiayi Ma,et al.  Robust Feature Matching for Remote Sensing Image Registration via Linear Adaptive Filtering , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[35]  Qingyun Du,et al.  Robust registration for remote sensing images by combining and localizing feature- and area-based methods , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[36]  Marcin Woźniak,et al.  DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression , 2021, Pattern Recognit..

[37]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[38]  John Mark Bishop,et al.  NAPSAC: high noise, high dimensional model parameterisation - it's in the bag , 2002 .

[39]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[40]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[41]  Weiwei Sun,et al.  Attentive Context Normalization for Robust Permutation-Equivariant Learning , 2019, ArXiv.

[42]  Jiayi Ma,et al.  Infrared and visible image fusion methods and applications: A survey , 2018, Inf. Fusion.

[43]  Junjun Jiang,et al.  Robust Feature Matching for Remote Sensing Image Registration via Locally Linear Transforming , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Stefan Roth,et al.  Neural Nearest Neighbors Networks , 2018, NeurIPS.

[45]  Zhiguo Cao,et al.  NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[47]  Long Quan,et al.  Learning Two-View Correspondences and Geometry Using Order-Aware Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).