Query expansion and query reduction in document retrieval

We investigate two seemingly incompatible approaches for improving document retrieval performance in the context of question answering: query expansion and query reduction. Queries are expanded by generating lexical paraphrases. Syntactic, semantic and corpus-based frequency information is used in this process. Queries are reduced by removing words that may detract from retrieval performance. Features that identify these words were obtained from decision graphs. These approaches were evaluated using a subset of queries from TREC8, 9 and 10. Our evaluation shows that each approach in isolation improves retrieval performance, and both approaches together yield substantial improvements. Specifically, query expansion followed by reduction improved the average number of correct documents retrieved by 21.7% and the average number of queries that can be answered by 15%.

[1]  C. S. Wallace,et al.  Coding Decision Trees , 1993, Machine Learning.

[2]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[3]  Sanda M. Harabagiu,et al.  Performance Issues and Error Analysis in an Open-Domain Question Answering System , 2002, ACL.

[4]  Noriko Tomuro,et al.  The Use of WordNet Sense Tagging in FAQFinder , 2000 .

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Ingrid Zukerman,et al.  Lexical Query Paraphrasing for Document Retrieval , 2002, COLING.

[7]  Sanda M. Harabagiu,et al.  The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering , 2001, ACL.

[8]  Jonathan J. Oliver Decision Graphs - An Extension of Decision Trees , 1993 .

[9]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[10]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[11]  Rada Mihalcea,et al.  A Method for Word Sense Disambiguation of Unrestricted Text , 1999, ACL.

[12]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[13]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[14]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[15]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[16]  Hinrich Schütze,et al.  Information retrieval based on word senses , 1995 .