Retrieving answers from frequently asked questions pages on the web

We address the task of answering natural language questions by using the large number of Frequently Asked Questions (FAQ) pages available on the web. The task involves three steps: (1) fetching FAQ pages from the web; (2) automatic extraction of question/answer (Q/A) pairs from the collected pages; and (3) answering users' questions by retrieving appropriate Q/A pairs. We discuss our solutions for each of the three tasks, and give detailed evaluation results on a collected corpus of about 3.6Gb of text data (293K pages, 2.8M Q/A pairs), with real users' questions sampled from a web search engine log. Specifically, we propose simple but effective methods for Q/A extraction and investigate task-specific retrieval models for answering questions. Our best model finds answers for 36% of the test questions in the top 20 results. Our overall conclusion is that FAQ pages on the web provide an excellent resource for addressing real users' information needs in a highly focused manner.

[1]  Frederick H. Lochovsky,et al.  Data extraction and label assignment for web databases , 2003, WWW '03.

[2]  Luis Gravano,et al.  Learning to find answers to questions on the Web , 2004, TOIT.

[3]  Harris Wu,et al.  Probabilistic question answering on the web , 2002, WWW '02.

[4]  Gilad Mishne,et al.  Boosting Web Retrieval through Query Operations , 2005, BNAIC.

[5]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[6]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[7]  Qi He,et al.  An information extraction engine for web discussion forums , 2005, WWW '05.

[8]  Zhiping Zheng,et al.  AnswerBus question answering system , 2002 .

[9]  Noriko Tomuro,et al.  The Use of Question Types to Match Questions in FAQFinder , 2002 .

[10]  Noriko Tomuro,et al.  The Use of WordNet Sense Tagging in FAQFinder , 2000 .

[11]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[12]  David Carmel,et al.  eResponder: Electronic Question Responder , 2000, CoopIS.

[13]  Jimmy J. Lin,et al.  What Makes a Good Answer? The Role of Context in Question Answering , 2003, INTERACT.

[14]  Jimmy J. Lin,et al.  Omnibase: Uniform Access to Heterogeneous Data for Question Answering , 2002, NLDB.

[15]  Guy Lapalme,et al.  Using information extraction and natural language generation to answer e-mail , 2000, Data Knowl. Eng..

[16]  Chung-Hsien Wu,et al.  FAQ Mining via List Detection , 2002, COLING 2002.

[17]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[18]  Guy Lapalme,et al.  Using information extraction and natural language generation to answer e-mail , 2001, Data Knowl. Eng..

[19]  Pushpak Bhattacharyya,et al.  Is question answering an acquired skill? , 2004, WWW '04.

[20]  Steven D. Whitehead,et al.  Auto-FAQ: An Experiment in Cyberspace Leveraging , 1995, Comput. Networks ISDN Syst..

[21]  Boris Katz,et al.  Annotating the World Wide Web using Natural Language , 1997, RIAO.

[22]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[23]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[24]  Nigel Ford,et al.  Serendipity and information seeking: an empirical study , 2003, J. Documentation.

[25]  Kristian J. Hammond,et al.  Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System , 1997, AI Mag..

[26]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[27]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[28]  Ellen M. Voorhees Evaluating Answers to Definition Questions , 2003, HLT-NAACL.

[29]  Roxana Gîrju,et al.  Automatic Detection of Causal Relations for Question Answering , 2003, ACL 2003.

[30]  Nicholas Kushmerick,et al.  Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..

[31]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.

[32]  Eric Brill,et al.  Automatic Question Answering: Beyond the Factoid , 2004, NAACL.

[33]  Jungyun Seo,et al.  High-performance FAQ retrieval using an automatic clustering method of query logs , 2006, Inf. Process. Manag..