Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation

This paper describes a simple framework for automatically annotating images using non-parametric models of distributions of image features. We show that under this framework quite simple image properties such as global colour and texture distributions provide a strong basis for reliably annotating images. We report results on subsets of two photographic libraries, the Corel Photo Archive and the Getty Image Archive. We also show how the popular Earth Mover’s Distance measure can be effectively incorporated within this framework.

[1]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[2]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  J. Friedman,et al.  PROJECTION PURSUIT DENSITY ESTIMATION , 1984 .

[4]  W. Härdle,et al.  Applied Nonparametric Regression , 1991 .

[5]  Wolfgang Härdle,et al.  Applied Nonparametric Regression , 1991 .

[6]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[7]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[8]  C. Loader Bandwidth selection: classical or plug-in? , 1999 .

[9]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Joachim M. Buhmann,et al.  Empirical Evaluation of Dissimilarity Measures for Color and Texture , 2001, Comput. Vis. Image Underst..

[11]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[12]  Antonio Torralba,et al.  Scene-Centered Description from Spatial Envelope Properties , 2002, Biologically Motivated Computer Vision.

[13]  Thierry Pun,et al.  The Truth about Corel - Evaluation in Image Retrieval , 2002, CIVR.

[14]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[15]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[16]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[17]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[20]  Stefan M. Rüger,et al.  Evaluation of Texture Features for Content-Based Image Retrieval , 2004, CIVR.

[21]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[22]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[23]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.