ATHENA: An Ontology-Driven System for Natural Language Querying over Relational Data Stores

In this paper, we present ATHENA, an ontology-driven system for natural language querying of complex relational databases. Natural language interfaces to databases enable users easy access to data, without the need to learn a complex query language, such as SQL. ATHENA uses domain specific ontologies, which describe the semantic entities, and their relationships in a domain. We propose a unique two-stage approach, where the input natural language query (NLQ) is first translated into an intermediate query language over the ontology, called OQL, and subsequently translated into SQL. Our two-stage approach allows us to decouple the physical layout of the data in the relational store from the semantics of the query, providing physical independence. Moreover, ontologies provide richer semantic information, such as inheritance and membership relations, that are lost in a relational schema. By reasoning over the ontologies, our NLQ engine is able to accurately capture the user intent. We study the effectiveness of our approach using three different workloads on top of geographical (GEO), academic (MAS) and financial (FIN) data. ATHENA achieves 100% precision on the GEO and MAS workloads, and 99% precision on the FIN workload which operates on a complex financial ontology. Moreover, ATHENA attains 87.2%, 88.3%, and 88.9% recall on the GEO, MAS, and FIN workloads, respectively.

[1]  Raymond J. Mooney,et al.  Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing , 2001, ECML.

[2]  George Markowsky,et al.  A fast algorithm for Steiner trees , 1981, Acta Informatica.

[3]  Tok Wang Ling,et al.  Exploratory Keyword Search with Interactive Input , 2015, SIGMOD Conference.

[4]  Umair Shafique,et al.  A Comprehensive Study on Natural Language Processing and Natural Language Interface to Databases , 2014 .

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[7]  Yeye He,et al.  Keyword++ , 2010, Proc. VLDB Endow..

[8]  Christian Boitet,et al.  UNL Lexical Selection with Conceptual Vectors , 2002, LREC.

[9]  Philipp Cimiano,et al.  Natural Language Interfaces: What Is the Problem? - A Data-Driven Quantitative Analysis , 2009, NLDB.

[10]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[11]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[12]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[13]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[14]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[15]  Oren Etzioni,et al.  Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability , 2004, COLING.

[16]  Georgia Koutrika,et al.  Précis: from unstructured keywords as queries to structured databases as answers , 2007, The VLDB Journal.

[17]  Dietmar F. Rösner,et al.  NAUDA: a cooperative natural language interface to relational databases , 1993, SIGMOD '93.

[18]  W. S. Luk,et al.  ELFS: English language from SQL , 1986, TODS.

[19]  H. V. Jagadish,et al.  NaLIX: an interactive natural language interface for querying XML , 2005, SIGMOD '05.

[20]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[21]  Carlo Curino,et al.  Accessing and Documenting Relational Databases through OWL Ontologies , 2009, FQAS.

[22]  Sonia Bergamaschi,et al.  QUEST: A Keyword Search System for Relational Data based on Semantic and Machine Learning Techniques , 2013, Proc. VLDB Endow..

[23]  Carlo Zaniolo,et al.  Answering Controlled Natural Language Questions on RDF Knowledge Bases , 2016, EDBT.

[24]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[25]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[26]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[27]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[28]  J. Plesník A bound for the Steiner tree problem in graphs , 1981 .

[29]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Torben Bach Pedersen,et al.  Extending Practical Pre-Aggregation in On-Line Analytical Processing , 1999, VLDB.

[31]  Donald Kossmann,et al.  SODA: Generating SQL for Business Users , 2012, Proc. VLDB Endow..

[32]  Elisabeth Métais,et al.  Natural language interfaces : what's the problem? -a data-driven quantitative analysis , 2010 .