Image annotation with parametric mixture model based multi-class multi-labeling

Image annotation, which labels an image with a set of semantic terms so as to bridge the semantic gap between low level features and high level semantics in visual information retrieval, is generally posed as a classification problem. Recently, multi-label classification has been investigated for image annotation since an image presents rich contents and can be associated with multiple concepts (i.e. labels). In this paper, a parametric mixture model based multi-class multi-labeling approach is proposed to tackle image annotation. Instead of building classifiers to learn individual labels exclusively, we model images with parametric mixture models so that the mixture characteristics of labels can be simultaneously exploited in both training and annotation processes. Our proposed method has been benchmarked with several state-of-the-art methods and achieved promising results.

[1]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[3]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[4]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[5]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[6]  David A. Forsyth,et al.  The effects of segmentation and feature choice in a translation model of object recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[8]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[9]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[12]  Jianping Fan,et al.  Incorporating Concept Ontology for Hierarchical Video Classification, Annotation, and Visualization , 2007, IEEE Transactions on Multimedia.

[13]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[14]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[15]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[17]  Wan-Chi Siu,et al.  Multimedia Information Retrieval and Management , 2003 .

[18]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[20]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[21]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[22]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.