Automatic question answering using the web: Beyond the Factoid

In this paper we describe and evaluate a Question Answering (QA) system that goes beyond answering factoid questions. Our approach to QA assumes no restrictions on the type of questions that are handled, and no assumption that the answers to be provided are factoids. We present an unsupervised approach for collecting question and answer pairs from FAQ pages, which we use to collect a corpus of 1 million question/answer pairs from FAQ pages available on the Web. This corpus is used to train various statistical models employed by our QA system: a statistical chunker used to transform a natural language-posed question into a phrase-based query to be submitted for exact match to an off-the-shelf search engine; an answer/question translation model, used to assess the likelihood that a proposed answer is indeed an answer to the posed question; and an answer language model, used to assess the likelihood that a proposed answer is a well-formed answer. We evaluate our QA system in a modular fashion, by comparing the performance of baseline algorithms against our proposed algorithms for various modules in our QA system. The evaluation shows that our system achieves reasonable performance in terms of answer accuracy for a large variety of complex, non-factoid questions.

[1]  Dragomir R. Radev,et al.  Mining the web for answers to natural language questions , 2001, CIKM '01.

[2]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[3]  Eduard H. Hovy,et al.  The Use of External Knowledge of Factoid QA , 2001, TREC.

[4]  Daniel Marcu,et al.  Natural Language Based Reformulation Resource and Wide Exploitation for Question Answering , 2002, TREC.

[5]  Salim Roukos,et al.  IBM's Statistical Question Answering System-TREC 11 , 2001, TREC.

[6]  Daniel Marcu,et al.  A Noisy-Channel Approach to Question Answering , 2003, ACL.

[7]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[8]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[9]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[10]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[11]  Sanda M. Harabagiu,et al.  The Informative Role of WordNet in Open-Domain Question Answering , 2004, HLT-NAACL 2004.

[12]  Sanda M. Harabagiu,et al.  LCC Tools for Question Answering , 2002, TREC.

[13]  Jennifer Chu-Carroll,et al.  Use of WordNet Hypernyms for Answering What-Is Questions , 2001, TREC.

[14]  Richard J. Mammone,et al.  Trainable question-answering systems , 2001 .

[15]  Dan I. Moldovan,et al.  Lexical Chains for Question Answering , 2002, COLING.

[16]  Sanda M. Harabagiu,et al.  High performance question/answering , 2001, SIGIR '01.

[17]  Luis Gravano,et al.  Learning to find answers to questions on the Web , 2004, TOIT.

[18]  Kristian J. Hammond,et al.  Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System , 1997, AI Mag..

[19]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[20]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[21]  Roxana Gîrju,et al.  Automatic Detection of Causal Relations for Question Answering , 2003, ACL 2003.

[22]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[23]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.