Learning Latent Stable Patterns for Image Understanding With Weak and Noisy Labels

This paper focuses on weakly supervised image understanding, in which the semantic labels are available only at image-level, without the specific object or scene location in an image. Existing algorithms implicitly assume that image-level labels are error-free, which might be too restrictive. In practice, image labels obtained from the pretrained predictors are easily contaminated. To solve this problem, we propose a novel algorithm for weakly supervised segmentation when only noisy image labels are available during training. More specifically, a semantic space is constructed first by encoding image labels through a graphlet (i.e., superpixel cluster) embedding process. Then, we observe that in the semantic space, the distribution of graphlets from images with a same label remains stable, regardless of the noises in image labels. Therefore, we propose a generative model, called latent stability analysis, to discover the stable patterns from images with noisy labels. Inferring graphlet semantics by making use of these mid-level stable patterns is much more secure and accurate than directly transferring noisy image-level labels into different regions. Finally, we calculate the semantics of each superpixel using maximum majority voting of its correlated graphlets. Comprehensive experimental results show that our algorithm performs impressively when the image labels are predicted by either the hand-crafted or deeply learned image descriptors.

[1]  Yao Zhao,et al.  Joint Optimization Toward Effective and Efficient Image Search , 2013, IEEE Transactions on Cybernetics.

[2]  Sven J. Dickinson,et al.  TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Zaïd Harchaoui,et al.  Image Classification with Segmentation Graph Kernels , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yue Gao,et al.  Feature Correlation Hypergraph: Exploiting High-order Potentials for Multimodal Recognition , 2014, IEEE Transactions on Cybernetics.

[8]  Ling Shao,et al.  Learning Discriminative Key Poses for Action Recognition , 2013, IEEE Transactions on Cybernetics.

[9]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Stefano Soatto,et al.  Really Quick Shift: Image Segmentation on a GPU , 2010, ECCV Workshops.

[12]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[13]  Jia Xu,et al.  Tell Me What You See and I Will Show You Where It Is , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[15]  Xiao Liu,et al.  Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Xuelong Li,et al.  Saliency Detection by Multiple-Instance Learning , 2013, IEEE Transactions on Cybernetics.

[17]  Bingbing Ni,et al.  Learning to photograph , 2010, ACM Multimedia.

[18]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[19]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[20]  Joachim M. Buhmann,et al.  Active learning for semantic segmentation with expected change , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Yong Jae Lee,et al.  Object-Graphs for Context-Aware Visual Category Discovery , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Joachim M. Buhmann,et al.  Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[25]  Paria Mehrani,et al.  Superpixels and Supervoxels in an Energy Optimization Framework , 2010, ECCV.

[26]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[29]  Xuelong Li,et al.  Spectral Embedded Hashing for Scalable Image Retrieval , 2014, IEEE Transactions on Cybernetics.

[30]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[33]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Joachim M. Buhmann,et al.  Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[36]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.

[37]  Tao Xiang,et al.  Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Joachim M. Buhmann,et al.  Weakly supervised semantic segmentation with a multi-image model , 2011, 2011 International Conference on Computer Vision.

[39]  Jing Liu,et al.  Weakly-Supervised Dual Clustering for Image Semantic Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Nuno Vasconcelos,et al.  Holistic Context Models for Visual Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Cordelia Schmid,et al.  Weakly Supervised Learning of Interactions between Humans and Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.