A Quantitative Evaluation of Natural Language Question Interpretation for Question Answering Systems

Systematic benchmark evaluation plays an important role in the process of improving technologies for Question Answering (QA) systems. While currently there are a number of existing evaluation methods for natural language (NL) QA systems, most of them consider only the final answers, limiting their utility within a black box style evaluation. Herein, we propose a subdivided evaluation approach to enable finer-grained evaluation of QA systems, and present an evaluation tool which targets the NL question (NLQ) interpretation step, an initial step of a QA pipeline. The results of experiments using two public benchmark datasets suggest that we can get a deeper insight about the performance of a QA system using the proposed approach, which should provide a better guidance for improving the systems, than using black box style approaches.

[1]  Elena Cabrio,et al.  Question Answering over Linked Data (QALD-5) , 2014, CLEF.

[2]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[3]  André Freitas,et al.  OKBQA: an Open Collaboration Framework for Development of Natural Language Question-Answering over Knowledge Bases , 2017, SEMWEB.

[4]  K. Bretonnel Cohen,et al.  Evaluation of SPARQL query generation from natural language questions , 2013, SWAIE@RANLP.

[5]  Axel-Cyrille Ngonga Ngomo,et al.  7th Open Challenge on Question Answering over Linked Data (QALD-7) , 2017, SemWebEval@ESWC.

[6]  Feifei Li,et al.  Scalable Multi-query Optimization for SPARQL , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[7]  Alexander Yates,et al.  Large-scale Semantic Parsing via Schema Matching and Lexicon Extension , 2013, ACL.

[8]  Jens Lehmann,et al.  Survey on challenges of Question Answering in the Semantic Web , 2017, Semantic Web.

[9]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[10]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[11]  Jiseong Kim,et al.  The Open Framework for Developing Knowledge Base And Question Answering System , 2016, COLING.

[12]  Jens Lehmann,et al.  LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs , 2017, SEMWEB.

[13]  Denny Vrandecic,et al.  Wikidata: a new platform for collaborative data collection , 2012, WWW.

[14]  Gerd Gröner,et al.  Which of the following SPARQL Queries are Similar? Why? , 2013, LD4IE@ISWC.

[15]  Pierre Zweigenbaum,et al.  Medical question answering: translating medical questions into sparql queries , 2012, IHI '12.

[16]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[17]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[18]  Elena Cabrio,et al.  Multilingual Question Answering over Linked Data (QALD-3): Lab Overview , 2013, CLEF.

[19]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[20]  Elena Cabrio,et al.  6th Open Challenge on Question Answering over Linked Data (QALD-6) , 2016, SemWebEval@ESWC.

[21]  Enrico Motta,et al.  Evaluating question answering over linked data , 2013, J. Web Semant..

[22]  Axel-Cyrille Ngonga Ngomo,et al.  BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering , 2012, AAAI Fall Symposium: Information Retrieval and Knowledge Discovery in Biomedical Text.

[23]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.