Learning to Extract Answers in Question Answering: Experimental Studies

Question Answering (QA) systems are complex programs able to answer a question in natural language. Their source of information is a given corpus or, as assumed here, the Web. To achieve their goal, these systems perform various subtasks among which the last one, called answer extraction, is very similar to an Information Extraction task. The main objective of this study it to adapt machine learning techniques defined for Information Extraction tasks to the slightly different task of answer extraction in QA systems. The specificities of QA systems are identified and exploited in this adaptation. Three algorithms, assuming an increasing abstraction of natural language texts, are tested and compared. RÉSUMÉ. Les systèmes Question/Réponse sont des programmes complexes capables de répondre à une question en langage naturel, en utilisant comme source d’information soit un corpus donné, soit, comme c’est le cas ici, le Web. Pour cela, ces systèmes réalisent différentes soustâches parmi lesquelles la dernière, appelée extraction de la réponse, est très similaire à une tâche d’Extraction d’Information. L’objectif de cet article est d’adapter les techniques d’apprentissage automatique utilisées en Extraction d’Information à l’extraction de la réponse. Les spécificités des systèmes Question/Réponse sont identifiées et utilisées dans cette adaptation. Trois algorithmes utilisant une abstraction croissante du texte sont testés et comparés.

[1]  Massih-Reza Amini,et al.  Génération de requêtes pour les systèmes de Q/R avec un modèle d'apprentissage statistique , 2004 .

[2]  Bert F. Green,et al.  Baseball: an automatic question-answerer , 1899, IRE-AIEE-ACM '61 (Western).

[3]  Fabien Torre,et al.  Codages et connaissances en extraction d'information , 2004 .

[4]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[5]  Jaime G. Carbonell,et al.  The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategh Approach with Dynamic Planning , 2003, TREC.

[6]  Michael Collins,et al.  Answer Extraction , 2000, ANLP.

[7]  Lance A. Miller,et al.  Review of The process of question answering: a computer simulation of cognition by Wendy G. Lehnert. Lawrence Erlbaum Associates 1978. , 1980 .

[8]  M. Cali,et al.  Relational learning techniques for natural language information extraction , 1998 .

[9]  Nicholas Kushmerick,et al.  Finite-State Approaches to Web Information Extraction , 2002, SCIE.

[10]  William A. Woods,et al.  Progress in natural language understanding: an application to lunar geology , 1973, AFIPS National Computer Conference.

[11]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[12]  Raymond J. Mooney,et al.  Relational learning techniques for natural language information extraction , 1998 .

[13]  Michael Colclough The Process of Question Answering — A Computer Simulation of Cognition , 1979 .

[14]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[15]  Joachim Niehren,et al.  Learning Node Selecting Tree Transducer from Completely Annotated Examples , 2004, ICGI.

[16]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[17]  T. Poibeau Extraction automatique d'information : Du texte brut au web sémantique , 2003 .

[18]  L. A. Miller The Process of Question Answering - A Computer Simulation of Cognition , 1980, CL.

[19]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.