Reducing Question Answering Input Data Using Named Entity Recognition

In a previous paper we proved that Named Entity Recognition plays an important role to improve Question Answering by both increasing the quality of the data and by reducing its quantity. Here we present a more in-depth discussion, studying several ways in which NER can be applied in order to produce a maximum data reduction. We achieve a 60% reduction without significant data loss and a 92.5% with a reasonable implication in data quality.

[1]  Fernando Llopis IR-n un sistema de Recuperación de Información basado en pasajes , 2003, Proces. del Leng. Natural.

[2]  C. Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Trondheim, Norway, August ... Papers (Lecture Notes in Computer Science) , 2005 .

[3]  Carol Peters,et al.  CLEF 2003 Methodology and Metrics , 2003, CLEF.

[4]  Fernando Llopis,et al.  Improving Question Answering Using Named Entity Recognition , 2005, NLDB.

[5]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[6]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[7]  Nancy A. Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[8]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[9]  Carol Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems , 2003, Lecture Notes in Computer Science.

[10]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[11]  Marcel Worring,et al.  NIST Special Publication , 2005 .

[12]  Antonio Toral Dramneri: a free knowledge based tool to Named Entity Recognition , 2005 .

[13]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[14]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[15]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[16]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[17]  F. W. Lancaster,et al.  Information retrieval systems; characteristics, testing, and evaluation , 1968 .