Deep Object Co-segmentation via Spatial-Semantic Network Modulation

Object co-segmentation is to segment the shared objects in multiple relevant images, which has numerous applications in computer vision. This paper presents a spatial and semantic modulated deep network framework for object co-segmentation. A backbone network is adopted to extract multi-resolution image features. With the multi-resolution features of the relevant images as input, we design a spatial modulator to learn a mask for each image. The spatial modulator captures the correlations of image feature descriptors via unsupervised learning. The learned mask can roughly localize the shared foreground object while suppressing the background. For the semantic modulator, we model it as a supervised image classification task. We propose a hierarchical second-order pooling module to transform the image features for classification use. The outputs of the two modulators manipulate the multi-resolution features by a shift-and-scale operation so that the features focus on segmenting co-object regions. The proposed model is trained end-to-end without any intricate post-processing. Extensive experiments on four image co-segmentation benchmark datasets demonstrate the superior accuracy of the proposed method compared to state-of-the-art methods.

[1]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Brejesh Lall,et al.  Object cosegmentation using deep Siamese network , 2018, ArXiv.

[3]  Jianfei Cai,et al.  Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..

[4]  Subhasis Chaudhuri,et al.  Image Co-segmentation Using Maximum Common Subgraph Matching and Region Co-growing , 2016, ECCV.

[5]  Ming-Hsuan Yang,et al.  Show, Match and Segment: Joint Learning of Semantic Matching and Object Co-segmentation , 2019, ArXiv.

[6]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yun Fu,et al.  Image Cosegmentation via Saliency-Guided Constrained Clustering with Cosine Similarity , 2017, AAAI.

[8]  Genggeng Liu,et al.  A Survey of Object Co-Segmentation , 2019, IEEE Access.

[9]  Tengpeng Li,et al.  Co-Saliency Detection via Mask-Guided Fully Convolutional Networks With Multi-Scale Label Smoothing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Bogdan Raducanu,et al.  Saliency for Fine-grained Object Recognition in Domains with Scarce Training Data , 2018, Pattern Recognit..

[11]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xinlei Chen,et al.  Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Qilong Wang,et al.  Global Second-Order Pooling Convolutional Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xiu-Shen Wei,et al.  Unsupervised Object Discovery and Co-Localization by Deep Descriptor Transforming , 2017, ArXiv.

[16]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Vladimir Kolmogorov,et al.  Object cosegmentation , 2011, CVPR 2011.

[18]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Leonidas J. Guibas,et al.  Image Co-segmentation via Consistent Functional Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Aggelos K. Katsaggelos,et al.  Efficient Video Object Segmentation via Network Modulation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Fei Wu,et al.  Group-wise Deep Co-saliency Detection , 2017, IJCAI.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Jianfei Cai,et al.  Image Co-segmentation via Saliency Co-fusion , 2016, IEEE Transactions on Multimedia.

[24]  Song-Chun Zhu,et al.  Cosegmentation and Cosketch by Unsupervised Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Adrian Hilton,et al.  Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Luis Herranz,et al.  Cross-Modulation Networks for Few-Shot Learning , 2018, ArXiv.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[29]  Tong Lu,et al.  Deep-dense Conditional Random Fields for Object Co-segmentation , 2017, IJCAI.

[30]  Marc Brockschmidt,et al.  GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation , 2019, ICML.

[31]  Xiaochun Cao,et al.  Multiple Semantic Matching on Augmented $N$ -Partite Graph for Object Co-Segmentation. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[32]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yu-Chiang Frank Wang,et al.  Optimizing the decomposition for multiple foreground cosegmentation , 2015, Comput. Vis. Image Underst..

[34]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[36]  S. Amirhassan Monadjemi,et al.  Iterative algorithm for interactive co-segmentation using semantic information propagation , 2018, Applied Intelligence.

[37]  Hugo Larochelle,et al.  Modulating early visual processing by language , 2017, NIPS.

[38]  Dong Liu,et al.  Robust Deep Co-Saliency Detection with Group Semantic , 2019, AAAI.

[39]  Jianfei Cai,et al.  Automatic image co-segmentation using geometric mean saliency , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[40]  Palaiahnakote Shivakumara,et al.  A Novel Topic-Level Random Walk Framework for Scene Image Co-segmentation , 2014, ECCV.

[41]  Jean Ponce,et al.  Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation , 2016, International Journal of Computer Vision.

[42]  Jianfei Cai,et al.  Object Co-skeletonization with Co-segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Feiping Nie,et al.  Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Carsten Rother,et al.  Deep Object Co-Segmentation , 2018, ACCV.

[45]  Hong Chen,et al.  Semantic Aware Attention Based Deep Object Co-segmentation , 2018, ACCV.

[46]  Michal Irani,et al.  Co-segmentation by Composition , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[48]  Xiaoning Qian,et al.  Image Co-Saliency Detection and Co-Segmentation via Progressive Joint Optimization , 2019, IEEE Transactions on Image Processing.

[49]  Chang-Su Kim,et al.  Multiple random walkers and their application to image cosegmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Feiping Nie,et al.  Robust Object Co-Segmentation Using Background Prior , 2018, IEEE Transactions on Image Processing.

[51]  Leo Grady,et al.  Random walks based multi-image segmentation: Quasiconvexity results and GPU-based solutions , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Yung-Yu Chuang,et al.  Co-attention CNNs for Unsupervised Object Co-segmentation , 2018, IJCAI.

[53]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.