论文信息 - Improving Retrieval Effectiveness by Using Key Terms in Top Retrieved Documents

Improving Retrieval Effectiveness by Using Key Terms in Top Retrieved Documents

In this paper, we propose a method to improve the precision of top retrieved documents in Chinese information retrieval where the query is a short description by re-ordering retrieved documents in the initial retrieval. To re-order the documents, we firstly find out terms in query and their importance scales by making use of the information derived from top N(N<=30) retrieved documents in the initial retrieval; secondly, we re-order retrieved K(N<<K) documents by what kinds of terms of query they contain. That is, we first automatically extract key terms from top N retrieved documents, then we collect key terms that occur in query and their document frequencies in the N retrieved documents, finally we use these collected terms to re-order the initially retrieved documents. Each collected term is assigned a weight by its length and its document frequency in top N retrieved documents. Each document is re-ranked by the sum of weights of collected terms it contains. In our experiments on 42 query topics in NTCIR3 Cross Lingual Information Retrieval (CLIR) dataset, an average 17.8%-27.5% improvement can be made for top 10 documents and an average 6.6%-26.9% improvement can be made for top 100 documents at relax/rigid relevance judgment and different parameter setting.

[1] Jaap Kamps,et al. Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary , 2004, ECIR.

[2] Dong-Hong Ji,et al. Chinese Language IR based on Term Extraction , 2002, NTCIR.

[3] Yang Lingpeng,et al. Document Re-ranking Based on Automatically Acquired Key Terms in Chinese Information Retrieval , 2004, COLING.

[4] Stephen E. Robertson,et al. Query Expansion with Long-Span Collocates , 2003, Information Retrieval.

[5] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[6] Stephen E. Robertson,et al. Okapi at TREC-2 , 1993, TREC.

[7] Jian Zhang,et al. On the use of words and n-grams for Chinese information retrieval , 2000, IRAL '00.

[8] Dong-Hong Ji,et al. Chinese Information Retrieval Based on Terms and Ontology , 2004, NTCIR.

[9] Jun Wang,et al. Rerank Method Based on Individual Thesaurus , 2001, NTCIR.

[10] John Bear,et al. Using Information Extraction to Improve Document Retrieval , 1998, TREC.

[11] Norbert Fuhr,et al. Probabilistic Models in Information Retrieval , 1992, Comput. J..

[12] Dong-Hong Ji,et al. Document Re-ranking Based on Automatically Acquired Key Terms in Chinese Information Retrieval , 2004, COLING.

[13] Stephen E. Robertson,et al. Microsoft Cambridge at TREC 2002: Filtering Track , 2002, TREC.

[14] Stephen E. Robertson,et al. Microsoft Cambridge at TREC-9: Filtering Track , 2000, TREC.

[15] Chris Buckley,et al. Improving automatic query expansion , 1998, SIGIR '98.

[16] Hinrich Schütze. The hypertext concordance: a better back-of-the-book index , 1998 .

[17] Key-Sun Choi,et al. Document Re-ranking Model Using Clusters , 1999 .

[18] J. Donghong,et al. Online discovery of relevant terms from Internet , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[19] Key-Sun Choi,et al. Re-ranking model based on document clusters , 2001, Inf. Process. Manag..

[20] Claudio Carpineto,et al. Improving retrieval feedback with multiple term-ranking function combination , 2002, TOIS.

[21] Kui-Lam Kwok. Comparing representations in Chinese information retrieval , 1997, SIGIR '97.

[22] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.