Why Reinvent the Wheel: Let's Build Question Answering Systems Together

Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines.

[1]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[2]  Kuldeep Singh,et al.  Capturing Knowledge in Semantically-typed Relational Patterns to Enhance Relation Linking , 2017, K-CAP.

[3]  Elena Cabrio,et al.  Question Answering over Linked Data (QALD-5) , 2014, CLEF.

[4]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[5]  Jens Lehmann,et al.  LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs , 2017, SEMWEB.

[6]  Jin-Dong Kim,et al.  OKBQA Framework for collaboration on developing natural language question answering systems Prototype System Demonstration , 2017 .

[7]  Eric Nyberg,et al.  Building optimal information systems automatically: configuration space exploration for biomedical information systems , 2013, CIKM.

[8]  Kuldeep Singh,et al.  Qanary - The Fast Track to Creating a Question Answering System with Linked Data Technology , 2016, ESWC.

[9]  Zengchang Qin,et al.  Question Classification using Head Words and their Hypernyms , 2008, EMNLP.

[10]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[11]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[12]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[13]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[14]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[15]  Kuldeep Singh,et al.  QAestro - Semantic-Based Composition of Question Answering Pipelines , 2017, DEXA.

[16]  Axel-Cyrille Ngonga Ngomo,et al.  A service-oriented search framework for full text, geospatial and semantic search , 2014, SEM '14.

[17]  Kuldeep Singh,et al.  Towards a Message-Driven Vocabulary for Promoting the Interoperability of Question Answering Systems , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[18]  Kuldeep Singh,et al.  Matching Natural Language Relations to Knowledge Graph Properties for Question Answering , 2017, SEMANTiCS.

[19]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[20]  Kuldeep Singh,et al.  Qanary - A Methodology for Vocabulary-Driven Open Question Answering Systems , 2016, ESWC.

[21]  Atanas Kiryakov,et al.  Semantic annotation, indexing, and retrieval , 2004, J. Web Semant..

[22]  Milan Dojchinovski,et al.  Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia , 2013, ECML/PKDD.

[23]  Jens Lehmann,et al.  Survey on challenges of Question Answering in the Semantic Web , 2017, Semantic Web.

[24]  Günter Neumann,et al.  The QALL-ME Framework: A specifiable-domain multilingual Question Answering architecture , 2011, J. Web Semant..

[25]  Ralf Steinmetz,et al.  Heuristics for QoS-aware Web Service Composition , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[26]  Jens Lehmann,et al.  Towards an open question answering architecture , 2014, SEM '14.

[27]  Sören Auer,et al.  SINA: Semantic interpretation of user queries for question answering on interlinked data , 2015, J. Web Semant..

[28]  James R. Curran,et al.  Question classification with log-linear models , 2006, SIGIR.

[29]  Jens Lehmann,et al.  Using Multi-Label Classification for Improved Question Answering , 2017, ArXiv.

[30]  Min-Yen Kan,et al.  QANUS: An Open-source Question-Answering Platform , 2015, ArXiv.

[31]  Kuldeep Singh,et al.  The Qanary Ecosystem: Getting New Insights by Composing Question Answering Pipelines , 2017, ICWE.

[32]  Muhammad Saleem,et al.  Question Answering Over Linked Data: What is Difficult to Answer? What Affects the F scores? , 2017, BLINK/NLIWoD3@ISWC.

[33]  Asli Çelikyilmaz,et al.  Investigation of Question Classifier in Question Answering , 2009, EMNLP.