论文信息 - University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion

University of Chicago at NTCIR4 CLIR: Multi-Scale Query Expansion

Pseudo-relevance feedback, while useful in monolingual applications for refining and enriching short user queries, proves even more important in crosslanguage information retrieval (CLIR). For CLIR, query expansion before and after translation can provide an opportunity to recover from translation gaps, reduce ambiguity, and enhance recall. Furthermore, for CLIR in unsegmented Asian languages, appropriate unit selection for translation, indexing, and retrieval plays a key role. In our NTCIR4 CLIR experiments, we compare the effectiveness of different unit selection strategies - words and subword units - and different stages - pre- and post- translation for query expansion. We find that for the very short queries with many untranslatable words in this test collection, both pre- and post- translation query expansion, independently and in conjunction, significantly enhance retrieval effectiveness for all unit selection strategies. We find, however, no significant differences across unit selection strategies for expansion in merged multilingual runs. However, more detailed per-language analysis finds significantly better effectiveness in Japanese when character-bigram units are employed for the identification of presumed relevant documents during query expansion and word and bigram units are chosen for expansion over approaches that use wordbased units to identify relevant documents.

Gina-Anne Levow

[1] Ari Pirkola,et al. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[2] Gina-Anne Levow. Issues in pre- and post-translation document expansion: untranslatable cognates and missegmented words , 2003, IRAL.

[3] Hsin-Min Wang,et al. Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval , 2003, INTERSPEECH.

[4] James Mayfield,et al. Comparing cross-language query expansion techniques by degrading translation resources , 2002, SIGIR '02.

[5] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[6] Kenney Ng,et al. Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[7] Ross Wilkinson,et al. Chinese Document Retrieval at TREC-6 , 1997, TREC.

[8] Jianqiang Wang,et al. Mandarin-English Information (MEI): investigating translingual speech retrieval , 2004, Comput. Speech Lang..

[9] James Allan,et al. Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[10] W. Bruce Croft,et al. The INQUERY Retrieval System , 1992, DEXA.