Dynamic Keypoint Detection Network for Image Matching

Establishing effective correspondences between a pair of images is difficult due to real-world challenges such as illumination, viewpoint and scale variations. Modern detector-based methods typically learn fixed detectors from a given dataset, which is hard to extract repeatable and reliable keypoints for various images with extreme appearance changes and weakly textured scenes. To deal with this problem, we propose a novel Dynamic Keypoint Detection Network (DKDNet) for robust image matching via a dynamic keypoint feature learning module and a guided heatmap activator. The proposed DKDNet enjoys several merits. First, the proposed dynamic keypoint feature learning module can generate adaptive keypoint features via the attention mechanism, which is flexibly updated with the current input image and can capture keypoints with different patterns. Second, the guided heatmap activator can effectively fuse multi-group keypoint heatmaps by fully considering the importance of different feature channels, which can realize more robust keypoint detection. Extensive experimental results on four standard benchmarks demonstrate that our DKDNet outperforms state-of-the-art image-matching methods by a large margin. Specifically, our DKDNet can outperform the best image-matching method by 2.1% in AUC@ 3px on HPatches, 3.74% in AUC@<inline-formula><tex-math notation="LaTeX">$5^\circ$</tex-math><alternatives><mml:math><mml:msup><mml:mn>5</mml:mn><mml:mo>∘</mml:mo></mml:msup></mml:math><inline-graphic xlink:href="zhang-ieq1-3307889.gif"/></alternatives></inline-formula> on ScanNet, 7.14% in AUC@<inline-formula><tex-math notation="LaTeX">$5^\circ$</tex-math><alternatives><mml:math><mml:msup><mml:mn>5</mml:mn><mml:mo>∘</mml:mo></mml:msup></mml:math><inline-graphic xlink:href="zhang-ieq2-3307889.gif"/></alternatives></inline-formula> on MegaDepth and 12.32% in AUC@<inline-formula><tex-math notation="LaTeX">$5^\circ$</tex-math><alternatives><mml:math><mml:msup><mml:mn>5</mml:mn><mml:mo>∘</mml:mo></mml:msup></mml:math><inline-graphic xlink:href="zhang-ieq3-3307889.gif"/></alternatives></inline-formula> on YFCC100M.

[1]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Seungryong Kim,et al.  Deep Matching Prior: Test-Time Optimization for Dense Correspondence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Hujun Bao,et al.  LoFTR: Detector-Free Local Feature Matching with Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  L. Gool,et al.  Learning Accurate Dense Correspondences and When to Trust Them , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[6]  Pascal Fua,et al.  DISK: Learning local features with policy gradient , 2020, NeurIPS.

[7]  Xinghui Li,et al.  Dual-Resolution Correspondence Networks , 2020, NeurIPS.

[8]  Josef Sivic,et al.  Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions , 2020, ECCV.

[9]  Kai Han,et al.  Correspondence Networks With Adaptive Neighbourhood Consensus , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Long Quan,et al.  ASLFeat: Learning Local Features of Accurate Shape and Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[12]  Tomasz Malisiewicz,et al.  SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  T. Pajdla,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Sandro De Zanet,et al.  GLAMpoints: Greedily Learned Accurate Match Points , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Long Quan,et al.  Learning Two-View Correspondences and Geometry Using Order-Aware Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Gabriela Csurka,et al.  R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[17]  Xin Yu,et al.  SOSNet: Second Order Similarity Regularization for Local Descriptor Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lei Zhou,et al.  ContextDesc: Local Descriptor Augmentation With Cross-Modality Context , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jean Ponce,et al.  SFNet: Learning Object-Aware Semantic Correspondence , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Tomás Pajdla,et al.  Neighbourhood Consensus Networks , 2018, NeurIPS.

[21]  Mikael Persson,et al.  Lambda Twist: An Accurate Fast Robust Perspective Three Point (P3P) Solver , 2018, ECCV.

[22]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Pascal Fua,et al.  LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[24]  Vincent Lepetit,et al.  3D Pose Estimation and 3D Model Retrieval for Objects in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Yannis Avrithis,et al.  Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[29]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jan-Michael Frahm,et al.  Reconstructing the world* in six days , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Vincent Lepetit,et al.  TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[38]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[39]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[40]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[41]  Shuda Li,et al.  X Resolution Correspondence Networks , 2021, BMVC.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Torsten Sattler,et al.  Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[44]  Christopher Hunt SURF: Speeded-Up Robust Features , 2009 .

[45]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.