Topic Models for Image Annotation and Text Illustration

Image annotation, the task of automatically generating description words for a picture, is a key component in various image search and retrieval applications. Creating image databases for model development is, however, costly and time consuming, since the keywords must be hand-coded and the process repeated for new collections. In this work we exploit the vast resource of images and documents available on the web for developing image annotation models without any human involvement. We describe a probabilistic model based on the assumption that images and their co-occurring textual data are generated by mixtures of latent topics. We show that this model outperforms previously proposed approaches when applied to image annotation and the related task of text illustration despite the noisy nature of our dataset.

[1]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[2]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[3]  Cogollos van der Linden,et al.  Scene Classication Using a Hybrid Generative/Discriminative Approach , 2009 .

[4]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Christos Faloutsos,et al.  Automatic image captioning , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[6]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Michael I. Jordan,et al.  Probabilistic models of text and images , 2004 .

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Jiayu Tang,et al.  A Study of Quality Issues for Image Auto-Annotation With the Corel Dataset , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[11]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  William I. Grosky,et al.  42 VIDEO SHOT DETECTION USING COLOR ANGLOGRAM AND LATENT SEMANTIC INDEXING : FROM CONTENTS TO SEMANTICS , 2002 .

[14]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[17]  James Ze Wang,et al.  The Story Picturing Engine---a system for automatic text illustration , 2006, TOMCCAP.

[18]  Yansong Feng,et al.  Automatic Image Annotation Using Auxiliary Text Information , 2008, ACL.

[19]  A. P. deVries,et al.  Experimental evaluation of a generative probabilistic image retrieval model on 'easy' data , 2003 .

[20]  David Hawking,et al.  Results and challenges in Web search evaluation 1 , 1999 .

[21]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[22]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[23]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Daniel Gatica-Perez,et al.  Modeling Semantic Aspects for Cross-Media Image Indexing , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[26]  Chihli Hung and Chih-Fong Tsai,et al.  Automatically Annotating Images with Keywords: A Review of Image Annotation Systems , 2008 .

[27]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Thijs Westerveld,et al.  Experimental result analysis for a generative probabilistic image retrieval model , 2003, SIGIR.

[30]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.