Multi-scale Matching Networks for Semantic Correspondence

Deep features have been proven powerful in building accurate dense semantic correspondences in various previous works. However, the multi-scale and pyramidal hierarchy of convolutional neural networks has not been well studied to learn discriminative pixel-level features for semantic correspondence. In this paper, we propose a multiscale matching network that is sensitive to tiny semantic differences between neighboring pixels. We follow the coarse-to-fine matching strategy and build a top-down feature and matching enhancement scheme that is coupled with the multi-scale hierarchy of deep convolutional neural networks. During feature enhancement, intra-scale enhancement fuses same-resolution feature maps from multiple layers together via local self-attention and cross-scale enhancement hallucinates higher-resolution feature maps along the top-down pathway. Besides, we learn complementary matching details at different scales thus the overall matching score is refined by features of different semantic levels gradually. Our multi-scale matching network can be trained end-to-end easily with few additional *Corresponding author: wfge@fudan.edu.cn learnable parameters. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on three popular benchmarks with high computational efficiency. The code has been released at https: //github.com/wintersun661/MMNet.

[1]  Jan Kautz,et al.  Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ming Yang,et al.  Bi-Directional Cascade Network for Perceptual Edge Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yoichi Sato,et al.  Joint Recovery of Dense Correspondence and Cosegmentation in Two Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  V. Prisacariu,et al.  Dual-Resolution Correspondence Networks , 2020, NeurIPS.

[5]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jean Ponce,et al.  Learning to Compose Hypercolumns for Visual Correspondence , 2020, ECCV.

[7]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xu Yang,et al.  A Continuation Method for Graph Matching Based Feature Correspondence , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yinda Zhang,et al.  ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems , 2018, ECCV.

[11]  Tomás Pajdla,et al.  Neighbourhood Consensus Networks , 2018, NeurIPS.

[12]  Cordelia Schmid,et al.  Proposal Flow: Semantic Correspondences from Object Proposals , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Leonhard Hennig,et al.  Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction , 2019, ACL.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[16]  Xiaoou Tang,et al.  LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jean Ponce,et al.  SFNet: Learning Object-Aware Semantic Correspondence , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Stefan Roth,et al.  Joint Optical Flow and Temporally Consistent Semantic Segmentation , 2016, ECCV Workshops.

[24]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[25]  Torsten Sattler,et al.  DGC-Net: Dense Geometric Correspondence Network , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Jean Ponce,et al.  Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Kai Han,et al.  Correspondence Networks With Adaptive Neighbourhood Consensus , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[30]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Josef Sivic,et al.  End-to-End Weakly-Supervised Semantic Alignment , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Junchi Yan,et al.  Learning Combinatorial Embedding Networks for Deep Graph Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Jean Ponce,et al.  SCNet: Learning Semantic Correspondence , 2017, ICCV.

[34]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[35]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[36]  Bohyung Han,et al.  Attentive Semantic Alignment with Offset-Aware Correlation Kernels , 2018, ECCV.

[37]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Guosheng Lin,et al.  DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[41]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[42]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[43]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[44]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[45]  Xuming He,et al.  Dynamic Context Correspondence Network for Semantic Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Makoto Yamada,et al.  Semantic Correspondence as an Optimal Transport Problem , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jean Ponce,et al.  SPair-71k: A Large-scale Benchmark for Semantic Correspondence , 2019, ArXiv.

[50]  Radu Timofte,et al.  GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[52]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[53]  Torsten Sattler,et al.  A Cross-Season Correspondence Dataset for Robust Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Ying Chen,et al.  M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network , 2018, AAAI.

[55]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[56]  Stephen Lin,et al.  Recurrent Transformer Networks for Semantic Correspondence , 2018, NeurIPS.

[57]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jinhui Tang,et al.  Feature Pyramid Transformer , 2020, ECCV.

[59]  Stephen Lin,et al.  DCTM: Discrete-Continuous Transformation Matching for Semantic Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[60]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..