Text Analysis for Automatic Image Annotation

We present a novel approach to automatically annotate images using associated text. We detect and classify all entities (persons and objects) in the text after which we determine the salience (the importance of an entity in a text) and visualness (the extent to which an entity can be perceived visually) of these entities. We combine these measures to compute the probability that an entity is present in the image. The suitability of our approach was successfully tested on 50 image-text pairs of Yahoo! News.

[1]  Andrei Mikheev,et al.  Automatic Rule Induction for Unknown-Word Guessing , 1997, CL.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Marie-Francine Moens,et al.  Cross-Media Entity Recognition in Nearly Parallel Visual and Textual Documents , 2007, RIAO.

[4]  Marie-Francine Moens,et al.  Using patterns of thematic progression for building a table of contents of a text , 2008, Natural Language Engineering.

[5]  J. Kamps,et al.  Words with attitude , 2002 .

[6]  Marie-Francine Moens,et al.  Generic technologies for single- and multi-document summarization , 2005, Inf. Process. Manag..

[7]  Hironobu Takahashi,et al.  Automatic word assignment to images based on image division and vector quantization , 2000 .

[8]  Alexander C. Berg,et al.  Who's In the Picture , 2004, NIPS 2004.

[9]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[10]  Marie-Francine Moens,et al.  Efficient Hierarchical Entity Classifier Using Conditional Random Fields , 2006, OntologyLearning@COLING/ACL.

[11]  Michael I. Jordan,et al.  1 Matching Words and Pictures , 2003 .

[12]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[13]  Christiane Fellbaum,et al.  Building Semantic Concordances , 1998 .

[14]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[15]  Marie-Francine Moens,et al.  Measuring Aboutness of an Entity in a Text , 2006 .

[16]  Thijs Westerveld,et al.  Image Retrieval: Content versus Context , 2000, RIAO.

[17]  B. Hayes THE WEB OF WORDS , 1999 .

[18]  Christiane Fellbaum,et al.  Performance And Confidence In A Semantic Annotation Task , 1998 .

[19]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[20]  Djoerd Hiemstra,et al.  An Integrated Approach to Text and Image Retrieval- The Lowlands Team at Trecvid 2005 , 2005, TRECVID.

[21]  Stéphane Ayache,et al.  CLIPS-LSR-NII Experiments at TRECVID 2005 , 2005, TRECVID.

[22]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.