State of the Art and Open Challenges in Natural Language Interfaces to Data

Recent advances in natural language understanding and processing resulted in renewed interest in natural language based interfaces to data, which provide an easy mechanism for non-technical users to access and query the data. While early systems only allowed simple selection queries over a single table, some recent work supports complex BI queries, with many joins and aggregation, and even nested queries. There are various approaches in the literature for interpreting user's natural language query. Rule-based systems try to identify the entities in the query, and understand the intended relationships between those entities. Recent years have seen the emergence and popularity of neural network based approaches which try to interpret the query holistically, by learning the patterns. In this tutorial, we will review these natural language interface solutions in terms of their interpretation approach, as well as the complexity of the queries they can generate. We will also discuss open research challenges.

[1]  Yunyao Li,et al.  Natural Language Data Management and Interfaces: Recent Development and Open Challenges , 2017, SIGMOD Conference.

[2]  H. V. Jagadish,et al.  NaLIR: an interactive natural language interface for querying relational databases , 2014, SIGMOD Conference.

[3]  Sonia Bergamaschi,et al.  Combining user and database perspective for solving keyword queries over relational databases , 2016, Inf. Syst..

[4]  Chris Brew,et al.  TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets , 2015, International Semantic Web Conference.

[5]  Lei Zou,et al.  Natural Language Question/Answering: Let Users Talk With The Knowledge Graph , 2017, CIKM.

[6]  Tao Yu,et al.  Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions , 2019, EMNLP.

[7]  Prasetya Utama,et al.  Bootstrapping an End-to-End Natural Language Interface for Databases , 2019, SIGMOD Conference.

[8]  Georgia Koutrika,et al.  Précis: from unstructured keywords as queries to structured databases as answers , 2007, The VLDB Journal.

[9]  Adam S. Miner,et al.  Smartphone-Based Conversational Agents and Responses to Questions About Mental Health, Interpersonal Violence, and Physical Health. , 2016, JAMA internal medicine.

[10]  Abdul Quamar,et al.  An Ontology-Based Conversation System for Knowledge Bases , 2020, SIGMOD Conference.

[11]  Tao Yu,et al.  TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation , 2018, NAACL.

[12]  K. Fitzpatrick,et al.  Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial , 2017, JMIR mental health.

[13]  Hannah Bast,et al.  More Accurate Question Answering on Freebase , 2015, CIKM.

[14]  Diptikalyan Saha,et al.  Tooling Framework for Instantiating Natural Language Querying System , 2018, Proc. VLDB Endow..

[15]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[16]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Haixun Wang,et al.  A Transfer-Learnable Natural Language Interface for Databases , 2018, ArXiv.

[19]  S. Sudarshan,et al.  BANKS: Browsing and Keyword Searching in Relational Databases , 2002, VLDB.

[20]  Toni Giorgino,et al.  Automated Spoken Dialog System for Hypertensive Patient Home Management , 2004 .

[21]  Pierre Zweigenbaum,et al.  MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies , 2015, Inf. Process. Manag..

[22]  Abraham Bernstein,et al.  Evaluating the usability of natural language query languages and interfaces to Semantic Web knowledge bases , 2010, J. Web Semant..

[23]  Vasilis Efthymiou,et al.  Expanding Query Answers on Medical Knowledge Bases , 2020, EDBT.

[24]  H. V. Jagadish,et al.  Constructing Expressive Relational Queries with Dual-Specification Synthesis , 2020, CIDR.

[25]  S. Furui,et al.  Automatic recognition and understanding of spoken language - a first step toward natural human-machine communication , 2000, Proceedings of the IEEE.

[26]  Tao Yu,et al.  SParC: Cross-Domain Semantic Parsing in Context , 2019, ACL.

[27]  Carsten Binnig,et al.  DBPal: A Learned NL-Interface for Databases , 2018, SIGMOD Conference.

[28]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[29]  Fei Li,et al.  Understanding Natural Language Queries over Relational Databases , 2016, SGMD.

[30]  Abdul Quamar,et al.  Natural Language Querying of Complex Business Intelligence Queries , 2019, SIGMOD Conference.

[31]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[32]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[33]  Kalina Bontcheva,et al.  A Text-based Query Interface to OWL Ontologies , 2008, LREC.

[34]  Georgia Koutrika,et al.  Précis: The Essence of a Query Answer , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[35]  Abdul Quamar,et al.  Ontology-Based Natural Language Query Interfaces for Data Exploration , 2018, IEEE Data Eng. Bull..

[36]  Morgan C. Benton,et al.  Evaluating Quality of Chatbots and Intelligent Conversational Agents , 2017, ArXiv.

[37]  Jin-Dong Kim,et al.  A Quantitative Evaluation of Natural Language Question Interpretation for Question Answering Systems , 2018, JIST.

[38]  Dawn Xiaodong Song,et al.  SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[39]  Philipp Cimiano,et al.  Evaluation of a Layered Approach to Question Answering over Linked Data , 2012, International Semantic Web Conference.

[40]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[41]  Dietmar F. Rösner,et al.  NAUDA: a cooperative natural language interface to relational databases , 1993, SIGMOD '93.

[42]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[43]  Donald Kossmann,et al.  SODA: Generating SQL for Business Users , 2012, Proc. VLDB Endow..

[44]  Xifeng Yan,et al.  DialSQL: Dialogue Based Structured Query Generation , 2018, ACL.

[45]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[46]  Wolfgang Nejdl,et al.  From keywords to semantic queries - Incremental query construction on the semantic web , 2009, J. Web Semant..

[47]  Edward J. McCluskey,et al.  Logic design principles - with emphasis on testable semicustom circuits , 1986, Prentice Hall series in computer engineering.

[48]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[49]  Sean Sullivan,et al.  USI Answers: Natural Language Question Answering Over (Semi-) Structured Industry Data , 2013, IAAI.

[50]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[51]  H. V. Jagadish,et al.  NaLIX: an interactive natural language interface for querying XML , 2005, SIGMOD '05.

[52]  Umar Farooq Minhas,et al.  ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores , 2016, Proc. VLDB Endow..

[53]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[54]  Nikolaos G. Bourbakis,et al.  A survey on human machine dialogue systems , 2016, 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA).

[55]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[56]  H. V. Jagadish,et al.  Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[57]  Abraham Bernstein,et al.  A comparative survey of recent natural language interfaces for databases , 2019, The VLDB Journal.

[58]  Luyao Chen,et al.  CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases , 2019, EMNLP.

[59]  John Fox,et al.  Automatic generation of spoken dialogue from medical plans and ontologies , 2006, J. Biomed. Informatics.

[60]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.