Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework

As an interesting and emerging topic, co-saliency detection aims at simultaneously extracting common salient objects from a group of images. On one hand, traditional co-saliency detection approaches rely heavily on human knowledge for designing hand-crafted metrics to possibly reflect the faithful properties of the co-salient regions. Such strategies, however, always suffer from poor generalization capability to flexibly adapt various scenarios in real applications. On the other hand, most current methods pursue co-saliency detection in unsupervised fashions. This, however, tends to weaken their performance in real complex scenarios because they are lack of robust learning mechanism to make full use of the weak labels of each image. To alleviate these two problems, this paper proposes a new SP-MIL framework for co-saliency detection, which integrates both multiple instance learning (MIL) and self-paced learning (SPL) into a unified learning framework. Specifically, for the first problem, we formulate the co-saliency detection problem as a MIL paradigm to learn the discriminative classifiers to detect the co-saliency object in the “instance-level”. The formulated MIL component facilitates our method capable of automatically producing the proper metrics to measure the intra-image contrast and the inter-image consistency for detecting co-saliency in a purely self-learning way. For the second problem, the embedded SPL paradigm is able to alleviate the data ambiguity under the weak supervision of co-saliency detection and guide a robust learning manner in complex scenarios. Experiments on benchmark datasets together with multiple extended computer vision applications demonstrate the superiority of the proposed framework beyond the state-of-the-arts.

[1]  Joachim M. Buhmann,et al.  Weakly supervised semantic segmentation with a multi-image model , 2011, 2011 International Conference on Computer Vision.

[2]  King Ngi Ngan,et al.  Co-Salient Object Detection From Multiple Images , 2013, IEEE Transactions on Multimedia.

[3]  Yang Gao,et al.  Self-paced dictionary learning for image classification , 2012, ACM Multimedia.

[4]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Chao Li,et al.  Co-saliency detection via looking deep and wide , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  King Ngi Ngan,et al.  A Co-Saliency Model of Image Pairs , 2011, IEEE Transactions on Image Processing.

[8]  Junsong Yuan,et al.  Max-Margin Structured Output Regression for Spatio-Temporal Action Localization , 2012, NIPS.

[9]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[10]  Moncef Gabbouj,et al.  Automatic Object Segmentation by Quantum Cuts , 2014, 2014 22nd International Conference on Pattern Recognition.

[11]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Eli Shechtman,et al.  Cosaliency: where people look when comparing images , 2010, UIST.

[13]  Yu-Wing Tai,et al.  Salient Region Detection via High-Dimensional Color Transform , 2014, CVPR.

[14]  Ke Zhang,et al.  Sparse Reconstruction for Weakly Supervised Semantic Segmentation , 2013, IJCAI.

[15]  Yuzhen Niu,et al.  Saliency Aggregation: A Data-Driven Approach , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[17]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[18]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Xuelong Li,et al.  Saliency Detection by Multiple-Instance Learning , 2013, IEEE Transactions on Cybernetics.

[20]  James R. Foulds,et al.  Revisiting Multiple-Instance Learning Via Embedded Instance Selection , 2008, Australasian Conference on Artificial Intelligence.

[21]  Joachim M. Buhmann,et al.  Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[23]  Xiang Zhang,et al.  Superpixel-Based Spatiotemporal Saliency Detection , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Stephen Lin,et al.  Object-Based Multiple Foreground Video Co-segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Wenbin Zou,et al.  Co-Saliency Detection Based on Hierarchical Segmentation , 2014, IEEE Signal Processing Letters.

[26]  XuLei Yang,et al.  Weighted support vector machine for data classification , 2005 .

[27]  Zhuowen Tu,et al.  Unsupervised object class discovery via saliency-guided multiple class learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Xiaochun Cao,et al.  Cluster-Based Co-Saliency Detection , 2013, IEEE Transactions on Image Processing.

[29]  Lei Guo,et al.  Semantic Segmentation based on Stacked Discriminative Autoencoders and Context-Constrained Weakly Supervised Learning , 2015, ACM Multimedia.

[30]  Fei-Fei Li,et al.  Efficient Image and Video Co-localization with Frank-Wolfe Algorithm , 2014, ECCV.

[31]  Takeo Kanade,et al.  Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[32]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Joachim M. Buhmann,et al.  Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Tao Xiang,et al.  Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Tao Xiang,et al.  Weakly supervised object detector learning with model drift detection , 2011, 2011 International Conference on Computer Vision.

[36]  Chi-Man Pun,et al.  Image co-saliency detection by propagating superpixel affinities , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jiebo Luo,et al.  Improving Bottom-up Saliency Detection by Looking into Neighbors , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  Xiaochun Cao,et al.  Self-Adaptively Weighted Co-Saliency Detection via Rank Constraint , 2014, IEEE Transactions on Image Processing.

[40]  Junsong Yuan,et al.  Optimal spatio-temporal path discovery for video event detection , 2011, CVPR 2011.

[41]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[42]  Zhi Liu,et al.  Efficient Saliency-Model-Guided Visual Co-Saliency Detection , 2015, IEEE Signal Processing Letters.

[43]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Jun Zhou,et al.  MILIS: Multiple Instance Learning with Instance Selection , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Mei Han,et al.  Category-Independent Object-Level Saliency Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Jie Yang,et al.  Robust manifold-preserving diffusion-based saliency detection by adaptive weight construction , 2016, Neurocomputing.

[48]  Rujie Liu,et al.  Semi-supervised Learning for Large Scale Image Cosegmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[49]  Qi Zhao,et al.  Multi-Camera Saliency , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[53]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Chun-Rong Huang,et al.  Video Saliency Map Detection by Dominant Camera Motion Removal , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[55]  Chao Li,et al.  A Self-Paced Multiple-Instance Learning Framework for Co-Saliency Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[57]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[58]  Qi Xie,et al.  Self-Paced Learning for Matrix Factorization , 2015, AAAI.

[59]  Sheng Zeng,et al.  Weakly supervised semantic segmentation for social images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[61]  Peter V. Gehler,et al.  Deterministic Annealing for Multiple-Instance Learning , 2007, AISTATS.

[62]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[63]  Hwann-Tzong Chen,et al.  Preattentive co-saliency detection , 2010, 2010 IEEE International Conference on Image Processing.

[64]  Xindong Wu,et al.  SMILE: A Similarity-Based Approach for Multiple Instance Learning , 2010, 2010 IEEE International Conference on Data Mining.

[65]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[66]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[67]  Fei-Fei Li,et al.  Co-localization in Real-World Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.