A comparative survey of recent natural language interfaces for databases

Over the last few years, natural language interfaces (NLI) for databases have gained significant traction both in academia and industry. These systems use very different approaches as described in recent survey papers. However, these systems have not been systematically compared against a set of benchmark questions in order to rigorously evaluate their functionalities and expressive power. In this paper, we give an overview over 24 recently developed NLIs for databases. Each of the systems is evaluated using a curated list of ten sample questions to show their strengths and weaknesses. We categorize the NLIs into four groups based on the methodology they are using: keyword-, pattern-, parsing- and grammar-based NLI. Overall, we learned that keyword-based systems are enough to answer simple questions. To solve more complex questions involving subqueries, the system needs to apply some sort of parsing to identify structural dependencies. Grammar-based systems are overall the most powerful ones, but are highly dependent on their manually designed rules. In addition to providing a systematic analysis of the major systems, we derive lessons learned that are vital for designing NLIs that can answer a wide range of user questions.

[1]  Xifeng Yan,et al.  What It Takes to Achieve 100% Condition Accuracy on WikiSQL , 2018, EMNLP.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Rajender Kumar,et al.  A Comprehensive Study of Natural Language Interface To Database , 2014 .

[4]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[5]  Sébastien Ferré,et al.  Sparklis: An expressive query builder for SPARQL endpoints with guidance in natural language , 2016, Semantic Web.

[6]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[7]  Hamish Cunningham,et al.  Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-Based Lookup through the User Interaction , 2010, ESWC.

[8]  Georgia Koutrika,et al.  Précis: from unstructured keywords as queries to structured databases as answers , 2007, The VLDB Journal.

[9]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[10]  Wolfgang Nejdl,et al.  From keywords to semantic queries - Incremental query construction on the semantic web , 2009, J. Web Semant..

[11]  Ralph M. Weischedel A Hybrid Approach to Representation in the Janus Natural Language Processor , 1989, ACL.

[12]  E. F. Codd,et al.  Seven Steps to Rendezvous with the Casual User , 1974, IFIP Working Conference Data Base Management.

[13]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[14]  Pierre Zweigenbaum,et al.  MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies , 2015, Inf. Process. Manag..

[15]  Earl D. Sacerdoti,et al.  Language Access to Distributed Data with Error Recovery , 1977, IJCAI.

[16]  Abraham Bernstein,et al.  Evaluating the usability of natural language query languages and interfaces to Semantic Web knowledge bases , 2010, J. Web Semant..

[17]  William A. Woods,et al.  Progress in natural language understanding: an application to lunar geology , 1973, AFIPS National Computer Conference.

[18]  Abraham Bernstein,et al.  Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine , 2009 .

[19]  Dawn Xiaodong Song,et al.  SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[20]  Anca Marginean,et al.  GFMed: Question Answering over BioMedical Linked Data with Grammatical Framework , 2014, CLEF.

[21]  Jignesh M. Patel,et al.  Ava: From Data to Insights Through Conversations , 2017, CIDR.

[22]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[23]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[26]  Sonia Bergamaschi,et al.  QUEST: A Keyword Search System for Relational Data based on Semantic and Machine Learning Techniques , 2013, Proc. VLDB Endow..

[27]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[28]  Abraham Bernstein,et al.  Querix: A Natural Language Interface to Query Ontologies Based on Clarification Dialogs , 2006 .

[29]  Sören Auer,et al.  SINA: Semantic interpretation of user queries for question answering on interlinked data , 2015, J. Web Semant..

[30]  Yunyao Li,et al.  Natural Language Data Management and Interfaces , 2018, Synthesis Lectures on Data Management.

[31]  H. V. Jagadish,et al.  NaLIR: an interactive natural language interface for querying relational databases , 2014, SIGMOD Conference.

[32]  Sonia Bergamaschi,et al.  Combining user and database perspective for solving keyword queries over relational databases , 2016, Inf. Syst..

[33]  Chris Brew,et al.  TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets , 2015, International Semantic Web Conference.

[34]  Jens Lehmann,et al.  AskNow: A Framework for Natural Language Query Formalization in SPARQL , 2016, ESWC.

[35]  Hasan M. Jamil,et al.  Knowledge Rich Natural Language Queries over Structured Biological Databases , 2017, BCB.

[36]  Sanjay Silakari,et al.  Natural language Interface for Database: A Brief review , 2011 .

[37]  L. Vinet,et al.  A ‘missing’ family of classical orthogonal polynomials , 2010, 1011.1669.

[38]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[39]  Hannah Bast,et al.  More Accurate Question Answering on Freebase , 2015, CIKM.

[40]  Carsten Binnig,et al.  DBPal: A Learned NL-Interface for Databases , 2018, SIGMOD Conference.

[41]  H. V. Jagadish,et al.  NaLIX: A generic natural language search environment for XML data , 2007, TODS.

[42]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[43]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[44]  Dongyan Zhao,et al.  Answering Natural Language Questions via Phrasal Semantic Parsing , 2014, CLEF.

[45]  Stefan Riezler,et al.  NLmaps: A Natural Language Interface to Query OpenStreetMap , 2016, COLING.

[46]  Raymond J. Mooney,et al.  Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing , 2001, ECML.

[47]  Amit Mishra,et al.  A survey on question answering systems with classification , 2016, J. King Saud Univ. Comput. Inf. Sci..

[48]  Philipp Cimiano,et al.  Evaluation of a Layered Approach to Question Answering over Linked Data , 2012, International Semantic Web Conference.

[49]  Bert F. Green,et al.  Baseball: an automatic question-answerer , 1899, IRE-AIEE-ACM '61 (Western).

[50]  Abraham Bernstein,et al.  Querying Ontologies: A Controlled English Interface for End-Users , 2005, SEMWEB.

[51]  Frederick B. Thompson,et al.  Introducing ASK, A Simple Knowledgeable System , 1983, ANLP.

[52]  Wim Martens,et al.  An Analytical Study of Large SPARQL Query Logs , 2017, Proc. VLDB Endow..

[53]  Seung-won Hwang,et al.  KBQA: Learning Question Answering over QA Corpora and Knowledge Bases , 2019, Proc. VLDB Endow..

[54]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[55]  Lei Zou,et al.  Natural Language Question/Answering: Let Users Talk With The Knowledge Graph , 2017, CIKM.

[56]  Philipp Cimiano,et al.  AMUSE: Multilingual Semantic Parsing for Question Answering over Linked Data , 2017, International Semantic Web Conference.

[57]  Paul L. Bowen,et al.  Non-length based query challenges: An initial taxonomy , 2004 .

[58]  Sean Sullivan,et al.  USI Answers: Natural Language Question Answering Over (Semi-) Structured Industry Data , 2013, IAAI.

[59]  H. V. Jagadish,et al.  NaLIX: an interactive natural language interface for querying XML , 2005, SIGMOD '05.

[60]  Umar Farooq Minhas,et al.  ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores , 2016, Proc. VLDB Endow..

[61]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[62]  Kalina Bontcheva,et al.  A Text-based Query Interface to OWL Ontologies , 2008, LREC.

[63]  Diego Esteves,et al.  SPARQL as a Foreign Language , 2017, SEMANTiCS.

[64]  Jonathan Berant,et al.  Building a Semantic Parser Overnight , 2015, ACL.

[65]  Gary G. Hendrix,et al.  Developing a natural language interface to complex data , 1977, TODS.

[66]  Sébastien Ferré SQUALL: a High-Level Language for Querying and Updating the Semantic Web , 2011 .

[67]  Geoffrey Zweig,et al.  Fast and easy language understanding for dialog systems with Microsoft Language Understanding Intelligent Service (LUIS) , 2015, SIGDIAL Conference.

[68]  Esther Kaufmann Talking to the Semantic Web - Query Interfaces to Ontologies for the Casual User , 2006, International Semantic Web Conference.

[69]  Sébastien Ferré SQUALL: The expressiveness of SPARQL 1.1 made available as a controlled natural language , 2014, Data Knowl. Eng..

[70]  Donald Kossmann,et al.  SODA: Generating SQL for Business Users , 2012, Proc. VLDB Endow..

[71]  Yunyao Li,et al.  Natural Language Data Management and Interfaces: Recent Development and Open Challenges , 2017, SIGMOD Conference.

[72]  Alan Bundy,et al.  Reasoning with Context in the Semantic Web , 2012, J. Web Semant..

[73]  Dr.S Britto Ramesh Kumar,et al.  Survey on Natural Language Database Interfaces , 2017 .

[74]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.