Exploiting redundancy in question answering

Our goal is to automatically answer brief factual questions of the form ``When was the Battle of Hastings?'' or ``Who wrote The Wind in the Willows?''. Since the answer to nearly any such question can now be found somewhere on the Web, the problem reduces to finding potential answers in large volumes of data and validating their accuracy. We apply a method for arbitrary passage retrieval to the first half of the problem and demonstrate that answer redundancy can be used to address the second half. The success of our approach depends on the idea that the volume of available Web data is large enough to supply the answer to most factual questions multiple times and in multiple contexts. A query is generated from a question and this query is used to select short passages that may contain the answer from a large collection of Web data. These passages are analyzed to identify candidate answers. The frequency of these candidates within the passages is used to ``vote'' for the most likely answer. The approach is experimentally tested on questions taken from the TREC-9 question-answering test collection. As an additional demonstration, the approach is extended to answer multiple choice trivia questions of the form typically asked in trivia quizzes and television game shows.

[1]  Elizabeth D. Liddy,et al.  Question Answering: CNLP at the TREC 2002 Question Answering Track , 2002, TREC.

[2]  Wei Li,et al.  Information Extraction Supported Question Answering , 1999, TREC.

[3]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[4]  John D. Burger,et al.  Question Answering from Large Document Collections , 1999 .

[5]  Susan Gauch,et al.  Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web , 2000, SIGIR '00.

[6]  Alan F. Smeaton Information Retrieval: Still Butting Heads with Natural Language Processing? , 1997, SCIE.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Dragomir R. Radev,et al.  The Use of Predictive Annotation for Question Answering in TREC8 , 1999, TREC.

[9]  Charles L. A. Clarke,et al.  Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[10]  Eduard H. Hovy,et al.  Question Answering in Webclopedia , 2000, TREC.

[11]  Adwait Ratnaparkhi,et al.  IBM's Statistical Question Answering System , 2000, TREC.

[12]  Julian Kupiec,et al.  MURAX: a robust linguistic approach for question answering using an on-line encyclopedia , 1993, SIGIR.

[13]  Sanda M. Harabagiu,et al.  FALCON: Boosting Knowledge for Answer Engines , 2000, TREC.

[14]  Charles L. A. Clarke,et al.  Fast Automatic Passage Ranking (MultiText Experiments for TREC-8) , 1999, TREC.

[15]  Ralph Grishman,et al.  Design of the MUC-6 evaluation , 1995, MUC.

[16]  Dragomir R. Radev,et al.  Ranking suspected answers to natural language questions using predictive annotation , 2000, ANLP.

[17]  Steven J. Maiorano Finding Answers in Large Collections of Texts: Paragraph Indexing W Abductive Inference , 1999 .

[18]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[19]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[20]  Claire Cardie,et al.  Examining the Role of Statistical and Linguistic Knowledge Sources in a General-Knowledge Question-Answering System , 2000, ANLP.

[21]  Stephen J. Green,et al.  Halfway to Question Answering , 2000, TREC.

[22]  Claire Cardie,et al.  Empirical Methods in Information Extraction , 1997, AI Mag..