Label to region by bi-layer sparsity priors

In this work, we investigate how to automatically reassign the manually annotated labels at the image-level to those contextually derived semantic regions. First, we propose a bi-layer sparse coding formulation for uncovering how an image or semantic region can be robustly reconstructed from the over-segmented image patches of an image set. We then harness it for the automatic label to region assignment of the entire image set. The solution to bi-layer sparse coding is achieved by convex l1-norm minimization. The underlying philosophy of bi-layer sparse coding is that an image or semantic region can be sparsely reconstructed via the atomic image patches belonging to the images with common labels, while the robustness in label propagation requires that these selected atomic patches come from very few images. Each layer of sparse coding produces the image label assignment to those selected atomic patches and merged candidate regions based on the shared image labels. The results from all bi-layer sparse codings over all candidate regions are then fused to obtain the entire label to region assignments. Besides, the presenting bi-layer sparse coding framework can be naturally applied to perform image annotation on new test images. Extensive experiments on three public image datasets clearly demonstrate the effectiveness of our proposed framework in both label to region assignment and image annotation tasks.

[1]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[4]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[5]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[6]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[9]  Long Zhu,et al.  Unsupervised Learning of Probabilistic Object Models (POMs) for Object Classification, Segmentation, and Recognition Using Knowledge Propagation , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Long Zhu,et al.  Unsupervised learning of probabilistic object models (POMs) for object classification, segmentation and recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[14]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[15]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[16]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[17]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[18]  Richard S. Zemel,et al.  Latent topic random fields: Learning using a taxonomy of labels , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[20]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[21]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Rémi Gilleron,et al.  Learning Multi-label Alternating Decision Trees from Texts and Data , 2003, MLDM.

[23]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[25]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[26]  Bo Zhang,et al.  Exploiting spatial context constraints for automatic image region annotation , 2007, ACM Multimedia.

[27]  Luo Si,et al.  Effective automatic image annotation via a coherent language model and active learning , 2004, MULTIMEDIA '04.