论文信息 - DCU at WikipediaMM 2009: Document Expansion from Wikipedia Abstracts

DCU at WikipediaMM 2009: Document Expansion from Wikipedia Abstracts

In this paper, we describe our participation in the WikipediaMM task at CLEF 2009. Our main efforts concern the expansion of the image metadata from the Wikipedia abstracts collection DBpedia. Since the metadata is short for retrieval by query words, we decided to expand the metadata using a typical query expansion method. In our experiments, we use the Rocchio algorithm for document expansion. Our best run is in the 26th rank of all 57 runs which is under our expectation, and we think that the main reason is that our document expansion method uses all the words from the metadata documents which contain words which are unrelated to the content of the images. Compared with our text retrieval baseline, our best document expansion run improves MAP by 11.17%. As one of our conclusions, we think that the document expansion can play an effective factor in the image metadata retrieval task. Our content-based image retrieval uses the same approach as in our participation in ImageCLEF 2008.

Gareth J. F. Jones | Peter Wilkins | Johannes Leveling | Jinming Min

[1] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[2] Alan F. Smeaton,et al. DCU and UTA at ImageCLEFPhoto 2007 , 2008, CLEF.

[3] Justin Zobel,et al. Document expansion versus query expansion for ad-hoc retrieval , 2005 .

[4] J. J. Rocchio,et al. Relevance feedback in information retrieval , 1971 .

[5] Edward A. Fox,et al. Combination of Multiple Searches , 1993, TREC.

[6] Amit Singhal,et al. Document expansion for speech retrieval , 1999, SIGIR '99.