Image Decomposition With Multilabel Context: Algorithms and Applications

Most research on image decomposition, e.g., image segmentation and image parsing, has predominantly focused on the low-level visual clues within a single image and neglected the contextual information across images. In this paper, we present a new perspective to image decomposition piloted by the multilabel context associated with each individual image. Observing that the contextual information (i.e., local label representations of the same label are similar while those from different labels are dissimilar) exists across images, we propose to perform image decomposition in a collective way and obtain an optimal representation for each label from a set of multilabeled images. We formulate the problem as an optimization problem which maximizes inter-label difference while minimizing the intra-label difference of the target label representations and propose two ways to solve this problem. Such a contextual image decomposition has a wide variety of applications, among which two exemplary ones-multilabel image annotation and label ranking, are presented and evaluated with different classification techniques. Extensive experiments on two benchmark datasets demonstrate promising results.

[1]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[2]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[3]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[4]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  M. A. Bhatti,et al.  Practical Optimization Methods with Mathematica Applications (& CD-ROM) , 2002, J. Oper. Res. Soc..

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  M. A. Bhatti,et al.  Practical Optimization Methods , 2000 .

[10]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Ralph Gross,et al.  Concurrent Object Recognition and Segmentation by Graph Partitioning , 2002, NIPS.

[12]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[13]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[14]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[16]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[17]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[18]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[19]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[20]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[22]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[23]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[25]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[26]  Bo Zhang,et al.  Exploiting spatial context constraints for automatic image region annotation , 2007, ACM Multimedia.

[27]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Chih-Jen Lin,et al.  On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization , 2007, IEEE Transactions on Neural Networks.

[29]  W. Eric L. Grimson,et al.  Learning coupled conditional random field for image decomposition with application on object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[31]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Yong Wang,et al.  Coherent image annotation by learning semantic distance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[34]  Tao Mei,et al.  CrowdReranking: exploring multiple search engines for visual search reranking , 2009, SIGIR.

[35]  Tao Mei,et al.  Contextual decomposition of multi-label images , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[39]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[40]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .