Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation.

We present an approach for jointly matching and segmenting object instances of the same category within a collection of images. In contrast to existing algorithms that tackle the tasks of semantic matching and object co-segmentation in isolation, our method exploits the complementary nature of the two tasks. The key insights of our method are two-fold. First, the estimated dense correspondence fields from semantic matching provide supervision for object co-segmentation by enforcing consistency between the predicted masks from a pair of images. Second, the predicted object masks from object co-segmentation, in turn, allow us to reduce the adverse effects due to background clutters for improving semantic matching. Our model is end-to-end trainable and does not require supervision from manually annotated correspondences and object masks. We validate the efficacy of our approach on five benchmark datasets: TSS, Internet, PF-PASCAL, PF-WILLOW, and SPair-71k, and show that our algorithm performs favorably against the state-of-the-art methods on both semantic matching and object co-segmentation tasks.

[1]  Jia-Bin Huang,et al.  DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[4]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yong Jae Lee,et al.  FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Alexander G. Schwing,et al.  VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[10]  Jiebo Luo,et al.  Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance , 2011, International Journal of Computer Vision.

[11]  Jean Ponce,et al.  SCNet: Learning Semantic Correspondence , 2017, ICCV.

[12]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Stephen Lin,et al.  Discrete-Continuous Transformation Matching for Dense Semantic Correspondence , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ian D. Reid,et al.  Weakly Supervised Semantic Segmentation Based on Co-segmentation , 2017, BMVC.

[18]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Stephen Lin,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yung-Yu Chuang,et al.  DeepCO3: Deep Instance Co-Segmentation by Co-Peak Search and Co-Saliency Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[22]  Takeo Kanade,et al.  Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[23]  Xiaochun Cao,et al.  Multiple Semantic Matching on Augmented $N$ -Partite Graph for Object Co-Segmentation , 2017, IEEE Transactions on Image Processing.

[24]  Bastian Leibe,et al.  FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Juergen Gall,et al.  Direct Shot Correspondence Matching , 2018, BMVC.

[27]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[28]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Jianfei Cai,et al.  Image Co-segmentation via Saliency Co-fusion , 2016, IEEE Transactions on Multimedia.

[30]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[31]  Adrian Hilton,et al.  Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Chang-Su Kim,et al.  Multiple random walkers and their application to image cosegmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Stephen Lin,et al.  DCTM: Discrete-Continuous Transformation Matching for Semantic Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[37]  Jean Ponce,et al.  Proposal Flow , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Hong Chen,et al.  Semantic Aware Attention Based Deep Object Co-segmentation , 2018, ACCV.

[39]  Yung-Yu Chuang,et al.  Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[40]  Carsten Rother,et al.  Deep Object Co-Segmentation , 2018, ACCV.

[41]  Jean Ponce,et al.  SPair-71k: A Large-scale Benchmark for Semantic Correspondence , 2019, ArXiv.

[42]  Bing-Yu Chen,et al.  Matching Images With Multiple Descriptors: An Unsupervised Approach for Locally Adaptive Descriptor Selection , 2015, IEEE Transactions on Image Processing.

[43]  Cordelia Schmid,et al.  Proposal Flow: Semantic Correspondences from Object Proposals , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[45]  Xinlei Chen,et al.  Cycle-Consistency for Robust Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jean Ponce,et al.  Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  Ce Liu,et al.  Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Stefano Soatto,et al.  Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Narendra Ahuja,et al.  DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Juho Kannala,et al.  Semi-supervised Semantic Matching , 2018, ECCV Workshops.

[52]  Xiaoning Qian,et al.  Unsupervised CNN-Based Co-saliency Detection with Graphical Optimization , 2018, ECCV.

[53]  Andrew Blake,et al.  Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[54]  Jitendra Malik,et al.  Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Minh N. Do,et al.  DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yoichi Sato,et al.  Joint Recovery of Dense Correspondence and Cosegmentation in Two Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Yu-Chiang Frank Wang,et al.  Optimizing the decomposition for multiple foreground cosegmentation , 2015, Comput. Vis. Image Underst..

[58]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Tomás Pajdla,et al.  Neighbourhood Consensus Networks , 2018, NeurIPS.

[60]  Fan Yang,et al.  Object-Aware Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[62]  Jiangbo Lu,et al.  DAISY Filter Flow: A Generalized Discrete Approach to Dense Correspondences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[64]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[65]  Yung-Yu Chuang,et al.  Robust image alignment with multiple feature descriptors and matching-guided neighborhoods , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Tong Lu,et al.  Deep-dense Conditional Random Fields for Object Co-segmentation , 2017, IJCAI.

[68]  Allan Jabri,et al.  Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Björn Ommer,et al.  Deep Semantic Feature Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Wojciech Matusik,et al.  Image restoration using online photo collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[71]  Yen-Yu Lin,et al.  Progressive Feature Matching with Alternate Descriptor Selection and Correspondence Enrichment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  Andrea Vedaldi,et al.  AnchorNet: A Weakly Supervised Network to Learn Geometry-Sensitive Features for Semantic Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Seungryong Kim,et al.  PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence , 2018, ECCV.

[75]  Yun Fu,et al.  Image Cosegmentation via Saliency-Guided Constrained Clustering with Cosine Similarity , 2017, AAAI.

[76]  Jonathan Tompson,et al.  Temporal Cycle-Consistency Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Andrea Vedaldi,et al.  Self-Supervised Learning of Geometrically Stable Features Through Probabilistic Introspection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  Stephen Lin,et al.  Recurrent Transformer Networks for Semantic Correspondence , 2018, NeurIPS.

[79]  Xiaowei Zhou,et al.  Multi-image Matching via Fast Alternating Minimization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[80]  Yung-Yu Chuang,et al.  Co-attention CNNs for Unsupervised Object Co-segmentation , 2018, IJCAI.

[81]  Jean Ponce,et al.  Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation , 2016, International Journal of Computer Vision.

[82]  Qi Tian,et al.  Recent Advance in Content-based Image Retrieval: A Literature Survey , 2017, ArXiv.

[83]  Feiping Nie,et al.  Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  B. S. Manjunath,et al.  Weakly Supervised Manifold Learning for Dense Semantic Object Correspondence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[85]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86]  Jianfei Cai,et al.  Object Co-skeletonization with Co-segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[89]  Michal Irani,et al.  Co-segmentation by Composition , 2013, 2013 IEEE International Conference on Computer Vision.

[90]  Ming-Hsuan Yang,et al.  Deep Semantic Matching with Foreground Detection and Cycle-Consistency , 2018, ACCV.

[91]  Bohyung Han,et al.  Attentive Semantic Alignment with Offset-Aware Correlation Kernels , 2018, ECCV.

[92]  Xinlei Chen,et al.  Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.