Towards a theory of natural language interfaces to databases

The need for Natural Language Interfaces to databases (NLIs) has become increasingly acute as more and more people access information through their web browsers, PDAs, and cell phones. Yet NLIs are only usable if they map natural language questions to SQL queries correctly. As Schneiderman and Norman have argued, people are unwilling to trade reliable and predictable user interfaces for intelligent but unreliable ones. In this paper, we introduce a theoretical framework for reliable NLIs, which is the foundation for the fully implemented Precise NLI. We prove that, for a broad class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL query. We report on experiments testing Precise on several hundred questions drawn from user studies over three benchmark databases. We find that over 80% of the questions are semantically tractable questions, which Precise answers correctly. Precise automatically recognizes the 20% of questions that it cannot handle, and requests a paraphrase. Finally, we show that Precise compares favorably with Mooney's learning NLI and with Microsoft's English Query product

[1]  H. William Buttelmann,et al.  American Journal of Computational Linguistics , 1974 .

[2]  Gary G. Hendrix,et al.  Developing a natural language interface to complex data , 1977, TODS.

[3]  Richard I. Kittredge,et al.  Sublanguages , 1982, Am. J. Comput. Linguistics.

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[5]  David Stallard,et al.  A Terminological Simplification Transformation for Natural Language Question-Answering Systems , 1986, HLT.

[6]  Douglas E. Appelt,et al.  TEAM: An Experiment in the Design of Transportable Natural-Language Interfaces , 1987, Artif. Intell..

[7]  Werner Nutt,et al.  The Complexity of Concept Languages , 1997, KR.

[8]  Projektgruppe WINOPostfa A Terminological Knowledge Representation System with Complete Inference Algorithms , 1991 .

[9]  Satoshi Sekine A New Direction for Sublanguage N. L. P. , 1995 .

[10]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[11]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[12]  Anand Rajaraman,et al.  Conjunctive query containment revisited , 1997, Theor. Comput. Sci..

[13]  Takeaki Uno,et al.  Algorithms for Enumerating All Perfect, Maximum and Maximal Matchings in Bipartite Graphs , 1997, ISAAC.

[14]  Frank Meng,et al.  Database Query Formation from Natural Language using Semantic Modeling and Statistical Keyword Meani , 1999 .

[15]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[16]  James F. Allen,et al.  An architecture for more realistic conversational systems , 2001, IUI '01.

[17]  Ralph Grishman,et al.  Adaptive Information Extraction and Sublanguage Analysis , 2001 .

[18]  Raymond J. Mooney,et al.  Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing , 2001, ECML.

[19]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[20]  Oren Etzioni,et al.  A reliable natural language interface to household appliances , 2003, IUI '03.

[21]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI.