QA on the Web: a preliminary study for Spanish language

Finding accurate information on the Web has become a challenge due to the increment in the number of documents available online. Current search engines retrieve relevant documents to general - often short - user queries, but fail extracting answers to simple factual questions in natural language. This work presents the basis of a statistical question answering system capable to find answers to factual questions in Spanish language from the Web. This approach is supported on data redundancy rather than on sophisticated linguistic analyses of either questions and candidate answers. Preliminary results show that it is feasible to find concise and accurate answers from the Web to factual questions made in Spanish language. The study also concludes that the available Spanish documents in the Web are redundant enough in order to apply statistical methods like those described in this document in order to provide better mechanisms for information access.

[1]  Jimmy J. Lin The Web as a Resource for Question Answering: Perspectives and Challenges , 2002, LREC.

[2]  Enrico Motta,et al.  AQUA - Ontology-Based Question Answering System , 2004, MICAI.

[3]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[4]  James Allan,et al.  INQUERY and TREC-8 , 1998, TREC.

[5]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[6]  Horacio Rodríguez Hontoria,et al.  Los sistemas de búsqueda de respuestas desde una perspectiva actual , 2003 .

[7]  Dmitri Roussinov,et al.  Web question answering: technology and applications to business intelligence , 2005, Int. J. Internet Enterp. Manag..

[8]  Günter Neumann,et al.  Mining answers in German Web pages , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[9]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[10]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[11]  Eduard H. Hovy,et al.  The Use of External Knowledge of Factoid QA , 2001, TREC.

[12]  Ross Wilkinson,et al.  The RMIT/CSIRO Ad Hoc, Q&A, Web, Interactive, and Speech Experiments at TREC 8 , 1999, TREC.

[13]  Ellen M. Voorhees,et al.  Using Grammatical Relations , 2001 .

[14]  Michael Luck,et al.  Proceedings of the Third Mexican International Conference on Computer Science , 2001 .

[15]  Dmitri Roussinov,et al.  Web Question Answering: Technology and Business Applications , 2004, AMCIS.

[16]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[17]  Charles L. A. Clarke,et al.  Fast Automatic Passage Ranking (MultiText Experiments for TREC-8) , 1999, TREC.

[18]  Sabine Buchholz,et al.  Using Grammatical Relations, Answer Frequencies and the World Wide Web for TREC Question Answering , 2001, TREC.

[19]  Elizabeth D. Liddy,et al.  Question Answering: CNLP at the TREC 2002 Question Answering Track , 2002, TREC.

[20]  Eduard H. Hovy,et al.  Question Answering in Webclopedia , 2000, TREC.

[21]  Elizabeth D. Liddy,et al.  Question-Answering: CNLP at TREC-10 Question Answering Track , 2001 .