Fusion of Region and Image-Based Techniques for Automatic Image Annotation

We propose a concept-centered approach that combines region- and image-level analysis for automatic image annotation (AIA). At the region level, we group regions into separate concept groups and perform concept-centered region clustering separately. The key idea is that we make use of the inter- and intra-concept region distribution to eliminate unreliable region clusters and identify the main region clusters for each concept. We then derive the correspondence between the image region clusters and concepts. To further enhance the accuracy of AIA task, we employ a multi-stage kNN classification using the global features at the image level. Finally, we perform fusion of region- and image-level analysis to obtain the final annotations. Our results have been found to improve the performance significantly, with gains of 18.5% in recall and 8.3% in “number of concepts detected”, as compared to the best reported AIA results for the Corel image data set.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ted E. Senator,et al.  Multi-stage classification , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[4]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[5]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[6]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[7]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[9]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[10]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[11]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[12]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Chin-Hui Lee,et al.  Automatic Image Annotation through Multi-Topic Text Categorization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[15]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[18]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[19]  Stefan M. Rüger,et al.  Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation , 2005, CIVR.

[20]  R. Manmatha,et al.  Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.

[21]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .