Contextual decomposition of multi-label images

Most research on image decomposition, e.g. image segmentation and image parsing, has predominantly focused on the low-level visual clues within single image and neglected the contextual information across different images. In this paper, we present a new perspective to image decomposition piloted by the multi-labels associated with individual images. Observing that the context information (i.e., local label representations of the same label are similar while those from different labels are dissimilar) exists across different images, we propose to perform image decomposition in a collective way, and then the image decomposition problem is formulated as an optimization which maximizes inter-label difference and at the same time minimizes intra-label difference of the target label representations. Such contextual image decomposition has a wide variety of applications, among which the two exemplary ones are: 1) multi-label image annotation in which the sparse coding of a query image over the bases consisting of all learned label representations naturally produces the multi-label annotation, and 2) label ranking in which the annotated labels are re-ordered according to the sparse coding coefficients on those learned label representations. It is worth noting that these two applications can be performed simultaneously via the label propagation process in sparse coding.

[1]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[2]  W. Eric L. Grimson,et al.  Learning coupled conditional random field for image decomposition with application on object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jiebo Luo,et al.  Multilabel machine learning and its application to semantic scene classification , 2003, IS&T/SPIE Electronic Imaging.

[5]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[6]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[7]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[8]  M. A. Bhatti,et al.  Practical Optimization Methods with Mathematica Applications (& CD-ROM) , 2002, J. Oper. Res. Soc..

[9]  In-So Kweon,et al.  A semantic region descriptor for local feature based image categorization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[13]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  M. A. Bhatti,et al.  Practical Optimization Methods , 2000 .

[15]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[17]  Zihan Zhou,et al.  Demo: Robust face recognition via sparse representation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[18]  Ralph Gross,et al.  Concurrent Object Recognition and Segmentation by Graph Partitioning , 2002, NIPS.

[19]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[20]  Bo Zhang,et al.  Exploiting spatial context constraints for automatic image region annotation , 2007, ACM Multimedia.

[21]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[23]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.