Evaluation of response quality for heterogeneous question answering systems

The research in this paper makes explicit why existing measures for response quality evaluation is not suitable for the ever-evolving field of question answering and following that, a short-term solution for evaluating response quality of heterogeneous systems is put forward. To demonstrate the challenges in evaluating systems of different nature, this research presents a black-box approach using a classification scheme and scoring mechanism to assess and rank three example systems.

[1]  Wilson Wong,et al.  Practical Approach to Knowledge-based Question Answering with Natural Language Understanding and Advanced Reasoning , 2007, ArXiv.

[2]  Jimmy J. Lin,et al.  The START Multimedia Information System: Current Technology and Future Directions , 2002, Multimedia Information Systems.

[3]  Ellen M. Voorhees,et al.  Overview of TREC 2003. , 2003 .

[4]  Elizabeth D. Liddy,et al.  Evaluation of Restricted Domain Question-Answering Systems , 2004, ACL 2004.

[5]  Sanda M. Harabagiu,et al.  Performance issues and error analysis in an open-domain question answering system , 2003, TOIS.

[6]  Andrzej Skowron,et al.  Proceedings of the 2005 IEEE / WIC / ACM International Conference on Web Intelligence , 2005 .

[7]  James F. Allen Natural language understanding (2nd ed.) , 1995 .

[8]  Zhiping Zheng,et al.  AnswerBus question answering system , 2002 .

[9]  Patrick Saint-Dizier,et al.  Advanced Relaxation for Cooperative Question Answering , 2004, New Directions in Question Answering.

[10]  J. Facemire A proposed metric for the evaluation of natural language systems , 1989, Proceedings. IEEE Energy and Information Technologies in the Southeast'.

[11]  Anurag Srivastava,et al.  A vector measure for the intelligence of a question-answering (Q-A) system , 1995, IEEE Trans. Syst. Man Cybern..

[12]  Giancarlo Mauri,et al.  A Formal Basis for Performance Evaluation of Natural Language Understanding Systems , 1984, Comput. Linguistics.

[13]  Eric Nyberg,et al.  Evaluating QA Systems on Multiple Dimensions , 2002 .

[14]  Ulf Hermjakob,et al.  Parsing and Question Classification for Question Answering , 2001, ACL 2001.