论文信息 - Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation.

Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation.

We present an approach for jointly matching and segmenting object instances of the same category within a collection of images. In contrast to existing algorithms that tackle the tasks of semantic matching and object co-segmentation in isolation, our method exploits the complementary nature of the two tasks. The key insights of our method are two-fold. First, the estimated dense correspondence fields from semantic matching provide supervision for object co-segmentation by enforcing consistency between the predicted masks from a pair of images. Second, the predicted object masks from object co-segmentation, in turn, allow us to reduce the adverse effects due to background clutters for improving semantic matching. Our model is end-to-end trainable and does not require supervision from manually annotated correspondences and object masks. We validate the efficacy of our approach on five benchmark datasets: TSS, Internet, PF-PASCAL, PF-WILLOW, and SPair-71k, and show that our algorithm performs favorably against the state-of-the-art methods on both semantic matching and object co-segmentation tasks.

Ming-Hsuan Yang | Jia-Bin Huang | Yun-Chun Chen | Yen-Yu Lin

[1] Jia-Bin Huang,et al. DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3] Jean Ponce,et al. A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[4] Antonio Criminisi,et al. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[5] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Yong Jae Lee,et al. FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9] Alexander G. Schwing,et al. VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[10] Jiebo Luo,et al. Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance , 2011, International Journal of Computer Vision.

[11] Jean Ponce,et al. SCNet: Learning Semantic Correspondence , 2017, ICCV.

[12] Jean Ponce,et al. Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] David W. Jacobs,et al. WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Stephen Lin,et al. Discrete-Continuous Transformation Matching for Dense Semantic Correspondence , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Alexei A. Efros,et al. Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ian D. Reid,et al. Weakly Supervised Semantic Segmentation Based on Co-segmentation , 2017, BMVC.

[18] Vincent Lepetit,et al. DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Stephen Lin,et al. FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Yung-Yu Chuang,et al. DeepCO3: Deep Instance Co-Segmentation by Co-Peak Search and Co-Saliency Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Maneesh Kumar Singh,et al. DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[22] Takeo Kanade,et al. Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[23] Xiaochun Cao,et al. Multiple Semantic Matching on Augmented $N$ -Partite Graph for Object Co-Segmentation , 2017, IEEE Transactions on Image Processing.

[24] Bastian Leibe,et al. FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Rahul Sukthankar,et al. MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Juergen Gall,et al. Direct Shot Correspondence Matching , 2018, BMVC.

[27] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[28] Jean Ponce,et al. Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Jianfei Cai,et al. Image Co-segmentation via Saliency Co-fusion , 2016, IEEE Transactions on Multimedia.

[30] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[31] Adrian Hilton,et al. Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Antonio Torralba,et al. SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Chang-Su Kim,et al. Multiple random walkers and their application to image cosegmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Stephen Lin,et al. DCTM: Discrete-Continuous Transformation Matching for Semantic Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Stefan Roth,et al. UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[37] Jean Ponce,et al. Proposal Flow , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Hong Chen,et al. Semantic Aware Attention Based Deep Object Co-segmentation , 2018, ACCV.

[39] Yung-Yu Chuang,et al. Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[40] Carsten Rother,et al. Deep Object Co-Segmentation , 2018, ACCV.

[41] Jean Ponce,et al. SPair-71k: A Large-scale Benchmark for Semantic Correspondence , 2019, ArXiv.

[42] Bing-Yu Chen,et al. Matching Images With Multiple Descriptors: An Unsupervised Approach for Locally Adaptive Descriptor Selection , 2015, IEEE Transactions on Image Processing.

[43] Cordelia Schmid,et al. Proposal Flow: Semantic Correspondences from Object Proposals , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[45] Xinlei Chen,et al. Cycle-Consistency for Robust Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Jean Ponce,et al. Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48] Ce Liu,et al. Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Stefano Soatto,et al. Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Narendra Ahuja,et al. DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51] Juho Kannala,et al. Semi-supervised Semantic Matching , 2018, ECCV Workshops.

[52] Xiaoning Qian,et al. Unsupervised CNN-Based Co-saliency Detection with Graphical Optimization , 2018, ECCV.

[53] Andrew Blake,et al. Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[54] Jitendra Malik,et al. Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55] Minh N. Do,et al. DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Yoichi Sato,et al. Joint Recovery of Dense Correspondence and Cosegmentation in Two Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Yu-Chiang Frank Wang,et al. Optimizing the decomposition for multiple foreground cosegmentation , 2015, Comput. Vis. Image Underst..

[58] Eli Shechtman,et al. Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59] Tomás Pajdla,et al. Neighbourhood Consensus Networks , 2018, NeurIPS.

[60] Fan Yang,et al. Object-Aware Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[62] Jiangbo Lu,et al. DAISY Filter Flow: A Generalized Discrete Approach to Dense Correspondences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[64] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[65] Yung-Yu Chuang,et al. Robust image alignment with multiple feature descriptors and matching-guided neighborhoods , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67] Tong Lu,et al. Deep-dense Conditional Random Fields for Object Co-segmentation , 2017, IJCAI.

[68] Allan Jabri,et al. Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Björn Ommer,et al. Deep Semantic Feature Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Wojciech Matusik,et al. Image restoration using online photo collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[71] Yen-Yu Lin,et al. Progressive Feature Matching with Alternate Descriptor Selection and Correspondence Enrichment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Ce Liu,et al. Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[73] Andrea Vedaldi,et al. AnchorNet: A Weakly Supervised Network to Learn Geometry-Sensitive Features for Semantic Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74] Seungryong Kim,et al. PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence , 2018, ECCV.

[75] Yun Fu,et al. Image Cosegmentation via Saliency-Guided Constrained Clustering with Cosine Similarity , 2017, AAAI.

[76] Jonathan Tompson,et al. Temporal Cycle-Consistency Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77] Andrea Vedaldi,et al. Self-Supervised Learning of Geometrically Stable Features Through Probabilistic Introspection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78] Stephen Lin,et al. Recurrent Transformer Networks for Semantic Correspondence , 2018, NeurIPS.

[79] Xiaowei Zhou,et al. Multi-image Matching via Fast Alternating Minimization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[80] Yung-Yu Chuang,et al. Co-attention CNNs for Unsupervised Object Co-segmentation , 2018, IJCAI.

[81] Jean Ponce,et al. Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation , 2016, International Journal of Computer Vision.

[82] Qi Tian,et al. Recent Advance in Content-based Image Retrieval: A Literature Survey , 2017, ArXiv.

[83] Feiping Nie,et al. Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84] B. S. Manjunath,et al. Weakly Supervised Manifold Learning for Dense Semantic Object Correspondence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[85] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86] Jianfei Cai,et al. Object Co-skeletonization with Co-segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87] Josef Sivic,et al. Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88] Alex Kendall,et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[89] Michal Irani,et al. Co-segmentation by Composition , 2013, 2013 IEEE International Conference on Computer Vision.

[90] Ming-Hsuan Yang,et al. Deep Semantic Matching with Foreground Detection and Cycle-Consistency , 2018, ACCV.

[91] Bohyung Han,et al. Attentive Semantic Alignment with Offset-Aware Correlation Kernels , 2018, ECCV.

[92] Xinlei Chen,et al. Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.