Consistency Graph Modeling for Semantic Correspondence

To establish robust semantic correspondence between images covering different objects belonging to the same category, there are three important types of information including inter-image relationship, intra-image relationship and cycle consistency. Most existing methods only exploit one or two types of the above information and cannot make them enhance and complement each other. Different from existing methods, we propose a novel end-to-end Consistency Graph Modeling Network (CGMNet) for semantic correspondence by modeling inter-image relationship, intra-image relationship and cycle consistency jointly in a unified deep model. The proposed CGMNet enjoys several merits. First, to the best of our knowledge, this is the first work to jointly model the three kinds of information in a deep model for semantic correspondence. Second, our model has designed three effective modules including cross-graph module, intra-graph module and cycle consistency module, which can jointly learn more discriminative feature representations robust to local ambiguities and background clutter for semantic correspondence. Extensive experimental results show that our algorithm performs favorably against state-of-the-art methods on four challenging datasets including PF-PASCAL, PF-WILLOW, Caltech-101 and TSS.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Wojciech Matusik,et al.  Image restoration using online photo collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Xuming He,et al.  Dynamic Context Correspondence Network for Semantic Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[7]  Makoto Yamada,et al.  Semantic Correspondence as an Optimal Transport Problem , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10]  Sang Chul Ahn,et al.  Generalized Deformable Spatial Pyramid: Geometry-preserving dense correspondence estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[15]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[16]  Malek Adjouadi,et al.  A similarity measure for stereo feature matching , 1997, IEEE Trans. Image Process..

[17]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[18]  Pascal Fua,et al.  A Performance Evaluation of Local Features for Image-Based 3D Reconstruction , 2017, IEEE Transactions on Image Processing.

[19]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[20]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[21]  Bernard Ghanem,et al.  DeepGCNs: Can GCNs Go As Deep As CNNs? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Changsheng Xu,et al.  Learning Multi-Task Correlation Particle Filters for Visual Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Bing-Yu Chen,et al.  Co-Segmentation Guided Hough Transform for Robust Feature Matching , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[26]  Tomás Pajdla,et al.  Neighbourhood Consensus Networks , 2018, NeurIPS.

[27]  Cordelia Schmid,et al.  Proposal Flow: Semantic Correspondences from Object Proposals , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[30]  Changsheng Xu,et al.  Robust Structural Sparse Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Juho Kannala,et al.  Semantic Matching by Weakly Supervised 2D Point Set Registration , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Jean Ponce,et al.  Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Tianzhu Zhang,et al.  Graph Convolutional Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Qi Tian,et al.  Adaptive Graph Representation Learning for Video Person Re-Identification , 2020, IEEE Transactions on Image Processing.

[35]  Yoichi Sato,et al.  Joint Recovery of Dense Correspondence and Cosegmentation in Two Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ling Shao,et al.  Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[37]  Leonidas J. Guibas,et al.  Consistent Shape Maps via Semidefinite Programming , 2013, SGP '13.

[38]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[39]  Allan Jabri,et al.  Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[41]  Stephen Lin,et al.  DCTM: Discrete-Continuous Transformation Matching for Semantic Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Simon Lucey,et al.  Dense Semantic Correspondence Where Every Pixel is a Classifier , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[44]  Fan Yang,et al.  Object-Aware Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Ce Liu,et al.  Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Ville Kyrki,et al.  Category-based task specific grasping , 2015, Robotics Auton. Syst..

[47]  Andrea Vedaldi,et al.  Self-Supervised Learning of Geometrically Stable Features Through Probabilistic Introspection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[49]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Yong Jae Lee,et al.  FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jean Ponce,et al.  SCNet: Learning Semantic Correspondence , 2017, ICCV.

[52]  Changsheng Xu,et al.  Correlation Particle Filter for Visual Tracking , 2018, IEEE Transactions on Image Processing.

[53]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[54]  Radu Timofte,et al.  GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[56]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[58]  Wei Wu,et al.  End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Ming-Hsuan Yang,et al.  Deep Semantic Matching with Foreground Detection and Cycle-Consistency , 2018, ACCV.

[60]  Bohyung Han,et al.  Attentive Semantic Alignment with Offset-Aware Correlation Kernels , 2018, ECCV.

[61]  Kai Han,et al.  Correspondence Networks With Adaptive Neighbourhood Consensus , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Adrian Hilton,et al.  Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[64]  Seungryong Kim,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[66]  Jia-Bin Huang,et al.  DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[67]  Zhi-Gang Zheng,et al.  A region based stereo matching algorithm using cooperative optimization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Takeo Kanade,et al.  A multiple-baseline stereo , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[70]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[71]  Jean Ponce,et al.  Proposal Flow , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Stephen Lin,et al.  Recurrent Transformer Networks for Semantic Correspondence , 2018, NeurIPS.

[73]  Xiaowei Zhou,et al.  Multi-image Matching via Fast Alternating Minimization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[74]  Wei Liu,et al.  Unsupervised Deep Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Andrea Vedaldi,et al.  AnchorNet: A Weakly Supervised Network to Learn Geometry-Sensitive Features for Semantic Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Josef Sivic,et al.  End-to-End Weakly-Supervised Semantic Alignment , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Seungryong Kim,et al.  PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence , 2018, ECCV.

[78]  Junchi Yan,et al.  Learning Combinatorial Embedding Networks for Deep Graph Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[79]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.