Document expansion for image retrieval

Successful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One potential approach to address problems of query-document term mismatch is document expansion to include additional topically relevant indexing terms in a document which may encourage its retrieval when relevant to queries which do not match its original contents well. We propose and evaluate a new document expansion method using external resources. While results of previous research have been inconclusive in determining the impact of document expansion on retrieval effectiveness, our method is shown to work effectively for text-based image retrieval of short image annotation documents. Our approach uses the Okapi query expansion algorithm as a method for document expansion. We further show improved performance can be achieved by using a "document reduction" approach to include only the significant terms in a document in the expansion process. Our experiments on the WikipediaMM task at ImageCLEF 2008 show an increase of 16.5% in mean average precision (MAP) compared to a variation of Okapi BM25 retrieval model. To compare document expansion with query expansion, we also test query expansion from an external resource which leads an improvement by 9.84% in MAP over our baseline. Our conclusion is that the document expansion with document reduction and in combination with query expansion produces the overall best retrieval results for short-length document retrieval. For this image retrieval task, we also conclude that query expansion from external resources does not outperform the document expansion method.

[1]  Thijs Westerveld,et al.  The INEX 2006 Multimedia Track , 2006, INEX.

[2]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2009 .

[3]  Dong Zhou,et al.  TCD-DCU at TEL@CLEF 2009: Document Expansion, Query Translation and Language Modeling , 2009, CLEF.

[4]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[5]  Abby Goodrum,et al.  Image Information Retrieval: An Overview of Current Research , 2000, Informing Sci. Int. J. an Emerg. Transdiscipl..

[6]  Gareth J. F. Jones,et al.  DCU at WikipediaMM 2009: Document Expansion from Wikipedia Abstracts , 2009, CLEF.

[7]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[8]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[9]  Justin Zobel,et al.  Document expansion versus query expansion for ad-hoc retrieval , 2005 .

[10]  Gina-Anne Levow Issues in pre- and post-translation document expansion: untranslatable cognates and missegmented words , 2003, IRAL.

[11]  Hsin-Hsi Chen,et al.  Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval , 2008, CLEF.

[12]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[13]  Hsin-Hsi Chen,et al.  Experiment for Using Web Information to do Query and Document Expansion , 2007, CLEF.

[14]  Gina-Anne Levow,et al.  Translingual Topic Tracking with PRISE , 2000 .

[15]  Milad Shokouhi,et al.  Query Expansion Using External Evidence , 2009, ECIR.

[16]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..