Is Question Answering Better than Information Retrieval? Towards a Task-Based Evaluation Framework for Question Series

This paper introduces a novel evaluation framework for question series and employs it to explore the effectiveness of QA and IR systems at addressing users’ information needs. The framework is based on the notion of recall curves, which characterize the amount of relevant information contained within a fixed-length text segment. Although it is widely assumed that QA technology provides more efficient access to information than IR systems, our experiments show that a simple IR baseline is quite competitive. These results help us better understand the role of NLP technology in QA systems and suggest directions for future research.

[1]  Jimmy J. Lin,et al.  Will Pyramids Built of Nuggets Topple Over? , 2006, NAACL.

[2]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[3]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[4]  Tsuneaki Kato,et al.  Handling Information Access Dialogue through QA Technologies - A novel challenge for open-domain question answering , 2004 .

[5]  Jimmy J. Lin,et al.  Automatically Evaluating Answers to Definition Questions , 2005, HLT.

[6]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[7]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[8]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[9]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[10]  Hoa Trang Dang,et al.  Overview of the TREC 2006 Question Answering Track 99 , 2006, TREC.

[11]  Jimmy J. Lin,et al.  What Makes a Good Answer? The Role of Context in Question Answering , 2003, INTERACT.

[12]  Ellen M. Voorhees,et al.  Using Question Series to Evaluate Question Answering System Effectiveness , 2005, HLT.

[13]  Alistair Moffat,et al.  Efficient Retrieval of Partial Documents , 1995, Inf. Process. Manag..

[14]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[15]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[16]  Nina Wacholder,et al.  HITIQA: Towards Analytical Question Answering , 2004, COLING.