Information Retrieval for Question Answering

The Sheffield Natural Lanuage Processing group has developed a system for automatic question answering, which has been used for the 1999, 2000 and 2002 Text REtrieval Conferences (TRECs). It is composed of an information retrieval system, to select from a large collection of text those parts which are likely to contain an answer to a question, and an information extraction system which processes this text to extract the answer. This dissertation builds on previous work by Sam Scott to determine the optimal settings for the information retrieval part of the system. These experiments suggest that good quality passage retrieval is essential to the performance of the question answering system, so a framework is developed to support passage retrieval on systems that do not support it natively. The performance of this system is evaluated in a variety of information retrieval approaches.

[1]  Ellen M. Voorhees,et al.  The Ninth Text REtrieval Conference (TREC-9) , 2001 .

[2]  Mark Sanderson,et al.  University of Sheffield TREC-8 Q&A System , 1999, TREC.

[3]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[4]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[5]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[6]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[7]  Donna Harman,et al.  Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[8]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[9]  W. B. Cavnar,et al.  N-Gram-Based Text Filtering For TREC-2 , 1993, TREC.

[10]  Richard M. Schwartz,et al.  BBN at TREC7: Using Hidden Markov Models for Information Retrieval , 1998, TREC.

[11]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[12]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[13]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[14]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[15]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[16]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[17]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[18]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[19]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[20]  Ellen M. Voorhees,et al.  Overview of TREC 2001 , 2001, TREC.

[21]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[22]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[23]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[24]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[25]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[26]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[27]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[28]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.