A revisit of Generative Model for Automatic Image Annotation using Markov Random Fields

Much research effort on automatic image annotation (AIA) has been focused on generative model, due to its well formed theory and competitive performance as compared with many well designed and sophisticated methods. However, when considering semantic context for annotation, the model suffers from the weak learning ability. This is mainly due to the lack of parameter setting and appropriate learning strategy for characterizing the semantic context in the traditional generative model. In this paper, we present a new approach based on multiple Markov random fields (MRF) for semantic context modeling and learning. Differing from previous MRF related AIA approach; we explore the optimal parameter estimation and model inference systematically to leverage the learning power of traditional generative model. Specifically, we propose new potential function for site modeling based on generative model and build local graphs for each annotation keyword. The parameter estimation and model inference is performed in local optimal sense. We conduct experiments on commonly used benchmarks. On Corel 5000 images, we achieved 0.36 and 0.31 in recall and precision respectively on 263 keywords. This is a very significant improvement over the best reported result of the current state-of-the-art approaches.

[1]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[2]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[4]  Luo Si,et al.  Effective automatic image annotation via a coherent language model and active learning , 2004, MULTIMEDIA '04.

[5]  Takeo Kanade,et al.  Object detection using 2D spatial ordering constraints , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Jiebo Luo,et al.  Image Annotation Within the Context of Personal Photo Collections Using Hierarchical Event and Scene Models , 2009, IEEE Transactions on Multimedia.

[7]  Jiebo Luo,et al.  Annotating collections of photos using hierarchical event and scene models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Qi Zhang,et al.  Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching , 2007, CIVR '07.

[9]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[10]  Tomás Pajdla,et al.  Multi-label image segmentation via max-sum solver , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Changhu Wang,et al.  Scalable Markov model-based image annotation , 2008, CIVR '08.

[12]  Stan Z. Li,et al.  Markov Random Field Modeling in Computer Vision , 1995, Computer Science Workbench.

[13]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[15]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[16]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.