Word Co-occurrence and Markov Random Fields for Improving Automatic Image Annotation

In this paper a novel approach for improving automatic image annotation methods is proposed. The approach is based on the fact that accuracy of current image annotation methods is low if we look at the most confident label only. Instead, accuracy is improved if we look for the correct label within the set of the top k candidate labels. We take advantage of this fact and propose a Markov random field ( MRF) based on word co-occurrence information for the improvement of annotation systems. Through the MRF structure we take into account spatial dependencies between connected regions. As a result, we are considering semantic relationships between labels. We performed experiments with iterated conditional modes and simulated annealing as optimization strategies in a subset of the Corel benchmark collection. Experimental results of the proposed method together with a k nearest neighbors classifier as our annotation method show important error reductions.

[1]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[2]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[4]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[5]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[6]  Maosong Sun,et al.  Automatic Image Annotation Based on WordNet and Hierarchical Ensembles , 2006, CICLing.

[7]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[8]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[10]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[11]  Peter Carbonetto Unsupervised Statistical Models for General Object Recognition , 2003 .

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  Paul Clough,et al.  The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .