Exploiting spatial context constraints for automatic image region annotation

In this paper we conduct a relatively complete study on how to exploit spatial context constraints for automated image region annotation. We present a straight forward method to regularize the segmented regions into 2D lattice layout, so that simple grid-structure graphical models can be employed to characterize the spatial dependencies. We show how to represent the spatial context constraints in various graphical models and also present the related learning and inference algorithms. Different from most of the existing work, we specifically investigate how to combine the classification performance of discriminative learning and the representation capability of graphical models. To reliably evaluate the proposed approaches, we create a moderate scale image set with region-level ground truth. The experimental results show that (i) spatial context constraints indeed help for accurate region annotation, (ii) the approaches combining the merits of discriminative learning and context constraints perform best, (iii) image retrieval can benefit from accurate region-level annotation.

[1]  Jun Zhang,et al.  A Markov random field model-based approach to image interpretation , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[3]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[4]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[5]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[6]  Mark W. Schmidt,et al.  Support Vector Random Fields for Spatial Classification , 2005, PKDD.

[7]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[8]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[9]  Luo Si,et al.  Effective automatic image annotation via a coherent language model and active learning , 2004, MULTIMEDIA '04.

[10]  Sanjeev Khudanpur,et al.  Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.

[11]  John R. Smith,et al.  A Hybrid Framework for Detecting the Semantics of Concepts and Context , 2003, CIVR.

[12]  Rong Yan,et al.  Mining Relationship Between Video Concepts using Probabilistic Graphical Models , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[13]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[14]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[16]  Maosong Sun,et al.  Semi-supervised Learning for Image Annotation Based on Conditional Random Fields , 2006, CIVR.

[17]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[18]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[19]  Farshad Fotouhi,et al.  Region based image annotation through multiple-instance learning , 2005, MULTIMEDIA '05.

[20]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[21]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[22]  James Ze Wang,et al.  Content-based image retrieval: approaches and trends of the new age , 2005, MIR '05.

[23]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[24]  Wei-Ying Ma,et al.  2D Conditional Random Fields for Web information extraction , 2005, ICML.

[25]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[26]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[27]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  Sanjiv Kumar Multiclass Discriminative Fields for Parts-Based Object Detection , 2004 .

[29]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[30]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[31]  Benoit Huet,et al.  Semantic feature extraction with multidimensional hidden Markov model , 2006, Electronic Imaging.

[32]  Robert M. Gray,et al.  Image classification by a two-dimensional hidden Markov model , 2000, IEEE Trans. Signal Process..

[33]  Jianping Fan,et al.  Multi-level annotation of natural scenes using dominant image components and semantic concepts , 2004, MULTIMEDIA '04.

[34]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..