Towards a theory of natural language interfaces to databases

The need for Natural Language Interfaces to databases (NLIs) has become increasingly acute as more and more people access information through their web browsers, PDAs, and cell phones. Yet NLIs are only usable if they map natural language questions to SQL queries correctly. As Schneiderman and Norman have argued, people are unwilling to trade reliable and predictable user interfaces for intelligent but unreliable ones. In this paper, we introduce a theoretical framework for reliable NLIs, which is the foundation for the fully implemented Precise NLI. We prove that, for a broad class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL query. We report on experiments testing Precise on several hundred questions drawn from user studies over three benchmark databases. We find that over 80% of the questions are semantically tractable questions, which Precise answers correctly. Precise automatically recognizes the 20% of questions that it cannot handle, and requests a paraphrase. Finally, we show that Precise compares favorably with Mooney's learning NLI and with Microsoft's English Query product

[1]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI.

[2]  Oren Etzioni,et al.  A reliable natural language interface to household appliances , 2003, IUI '03.

[3]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[4]  Raymond J. Mooney,et al.  Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing , 2001, ECML.

[5]  Ralph Grishman,et al.  Adaptive Information Extraction and Sublanguage Analysis , 2001 .

[6]  James F. Allen,et al.  An architecture for more realistic conversational systems , 2001, IUI '01.

[7]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[8]  Frank Meng,et al.  Database Query Formation from Natural Language using Semantic Modeling and Statistical Keyword Meani , 1999 .

[9]  Takeaki Uno,et al.  Algorithms for Enumerating All Perfect, Maximum and Maximal Matchings in Bipartite Graphs , 1997, ISAAC.

[10]  Anand Rajaraman,et al.  Conjunctive query containment revisited , 1997, Theor. Comput. Sci..

[11]  Satoshi Sekine A New Direction for Sublanguage N. L. P. , 1995 .

[12]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[13]  Werner Nutt,et al.  The Complexity of Concept Languages , 1997, KR.

[14]  Projektgruppe WINOPostfa A Terminological Knowledge Representation System with Complete Inference Algorithms , 1991 .

[15]  Douglas E. Appelt,et al.  TEAM: An Experiment in the Design of Transportable Natural-Language Interfaces , 1987, Artif. Intell..

[16]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[17]  Richard I. Kittredge,et al.  Sublanguages , 1982, Am. J. Comput. Linguistics.

[18]  Gary G. Hendrix,et al.  Developing a natural language interface to complex data , 1977, TODS.

[19]  H. William Buttelmann,et al.  American Journal of Computational Linguistics , 1974 .