Ontology-Based Natural Language Query Interfaces for Data Exploration

Enterprises are creating domain-specific knowledge bases by curating and integrating all their business data, structured, unstructured and semi-structured, and using them in enterprise applications to derive better business decisions. One distinct characteristic of these enterprise knowledge bases, compared to the open-domain general purpose knowledge bases like DBpedia [16] and Freebase [6], is their deep domain specialization. This deep domain understanding empowers many applications in various domains, such as health care and finance. Exploring such knowledge bases, and operational data stores requires different querying capabilities. In addition to search, these databases also require very precise structured queries, including aggregations, as well as complex graph queries to understand the various relationships between various entities of the domain. For example, in a financial knowledge base, users may want to find out “which startups raised the most VC funding in the first quarter of 2017”; a very precise query that is best expressed in SQL. The users may also want to find all possible relationships between two specific board members of these startups, a query which is naturally expressed as an all-paths graph query. It is important to note that general purpose knowledge bases could also benefit from different query capabilities, but in this paper we focus on domain-specific knowledge graphs and their query needs. Instead of learning and using many complex query languages, one natural way to query the data in these cases is using natural language interfaces to explore the data. In fact, human interaction with technology through conversational services is making big strides in many application domains in recent years [13]. Such interfaces are very desirable because they do not require the users to learn a complex query language, such as SQL, and the users do not need to know the exact schema of the data, or how it is stored. There are several challenges in building a natural language interface to query data sets. The most difficult task is understanding the semantics of the query, hence the user intent. Early systems [3, 30] allowed only a set of keywords, which had very limited expressive power. There have been works to interpret the semantics of a full-blown English language query. These works in general try to disambiguate among the potentially multiple meanings of the words and their relationships. Some of these are machine-learning based [5, 24, 29] that require good training sets, which are hard to obtain. Others require user feedback [14, 17, 18]. However, excessive user interaction to resolve ambiguities can be detrimental to user experience. In this paper, we describe a unique end-to-end ontology-based system for natural language querying over complex data sets. The system uses domain ontologies, which describe the semantic entities and their relationships, to reason about and capture user intent. To support multiple query types, the system provides a poly store

[1]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[2]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[3]  Varish Mulwad,et al.  Integrated access to big data polystores through a knowledge-driven framework , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[4]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[5]  Raymond J. Mooney,et al.  Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing , 2001, ECML.

[6]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[7]  Oren Etzioni,et al.  Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability , 2004, COLING.

[8]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[9]  Ian Horrocks,et al.  A semantic approach to polystores , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[10]  H. V. Jagadish,et al.  NaLIX: an interactive natural language interface for querying XML , 2005, SIGMOD '05.

[11]  Umar Farooq Minhas,et al.  ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores , 2016, Proc. VLDB Endow..

[12]  Umair Shafique,et al.  A Comprehensive Study on Natural Language Processing and Natural Language Interface to Databases , 2014 .

[13]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[14]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[15]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[16]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[17]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[18]  Senthil Mani,et al.  Natural language querying in SAP-ERP platform , 2017, ESEC/SIGSOFT FSE.

[19]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[20]  Rajasekar Krishnamurthy,et al.  Creation and Interaction with Large-scale Domain-Specific Knowledge Bases , 2017, Proc. VLDB Endow..

[21]  Diego Calvanese,et al.  The MASTRO system for ontology-based data access , 2011, Semantic Web.

[22]  Vincent Ng,et al.  Supervised Noun Phrase Coreference Research: The First Fifteen Years , 2010, ACL.

[23]  Dietmar F. Rösner,et al.  NAUDA: a cooperative natural language interface to relational databases , 1993, SIGMOD '93.

[24]  S. Sudarshan,et al.  BANKS: Browsing and Keyword Searching in Relational Databases , 2002, VLDB.

[25]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[26]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[27]  Jonathan Ginzburg,et al.  Non-Sentential Utterances: Grammar and Dialogue Dynamics in Corpus Annotation , 2002, COLING.

[28]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.