External Query Reformulation for Text-Based Image Retrieval

In text-based image retrieval, the Incomplete Annotation Problem (IAP) can greatly degrade retrieval effectiveness. A standard method used to address this problem is pseudo relevance feedback (PRF) which updates user queries by adding feedback terms selected automatically from top ranked documents in a prior retrieval run. PRF assumes that the target collection provides enough feedback information to select effective expansion terms. This is often not the case in image retrieval since images often only have short metadata annotations leading to the IAP. Our work proposes the use of an external knowledge resource (Wikipedia) in the process of refining user queries. In our method, Wikipedia documents strongly related to the terms in user query ("definition documents") are first identified by title matching between the query and titles of Wikipedia articles. These definition documents are used as indicators to re-weight the feedback documents from an initial search run on a Wikipedia abstract collection using the Jaccard coefficient. The new weights of the feedback documents are combined with the scores rated by different indicators. Query-expansion terms are then selected based on these new weights for the feedback documents. Our method is evaluated on the ImageCLEF WikipediaMM image retrieval task using text-based retrieval on the document metadata fields. The results show significant improvement compared to standard PRF methods.

[1]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[2]  Khalid Al-Kofahi,et al.  Investigating external corpus and clickthrough statistics for query expansion in the legal domain , 2008, CIKM '08.

[3]  Yang Xu,et al.  Entity-based query reformulation using wikipedia , 2008, CIKM '08.

[4]  Kui-Lam Kwok Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report , 2004, Information Retrieval.

[5]  Milad Shokouhi,et al.  Query Expansion Using External Evidence , 2009, ECIR.

[6]  Andrew Trotman,et al.  Comparative Evaluation of XML Information Retrieval Systems: 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006 Dagstuhl Castle, Germany, December 17-20, 2006 Revised and Selected Papers , 2005 .

[7]  Theodora Tsikrika,et al.  Overview of the WikipediaMM Task at ImageCLEF 2009 , 2009, CLEF.

[8]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[9]  William R. Hersh,et al.  Phrases, Boosting, and Query Expansion Using External Knowledge Resources for Genomic Information Retrieval , 2003, TREC.

[10]  Jaime G. Carbonell,et al.  Retrieval and feedback models for blog feed search , 2008, SIGIR '08.

[11]  Yang Xu,et al.  Query dependent pseudo-relevance feedback based on wikipedia , 2009, SIGIR.

[12]  Carol Peters,et al.  Evaluating Systems for Multilingual and Multimodal Information Access, 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers , 2009, CLEF.

[13]  Maarten de Rijke,et al.  A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections , 2009, ACL/IJCNLP.

[14]  Gareth J. F. Jones,et al.  DCU at WikipediaMM 2009: Document Expansion from Wikipedia Abstracts , 2009, CLEF.

[15]  Carol Peters,et al.  Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access , 2008 .

[16]  Maarten de Rijke,et al.  External Query Expansion in the Blogosphere , 2008, TREC.

[17]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[18]  Thijs Westerveld,et al.  The INEX 2006 Multimedia Track , 2006, INEX.