English Access to Structured Data

We present work on using a domain model to guide text interpretation, in the context of a project that aims to interpret English questions as a sequence of queries to be answered from structured databases. We adapt a broad-coverage and ambiguity-enabled natural language processing (NLP) system to produce domain-specific logical forms, using knowledge of the domain to zero in on the appropriate interpretation. The vocabulary of the logical forms is drawn from a domain theory that constitutes a higher-level abstraction of the contents of a set of related databases. The meanings of the terms are encoded in an axiomatic domain theory. To retrieve information from the databases, the logical forms must be instantiated by values constructed from fields in the database. The axiomatic domain theory is interpreted by the first-order theorem prover SNARK to identify the groundings, and then retrieve the values through procedural attachments semantically linked to the database. SNARK attempts to prove the logical form as a theorem by reasoning over the theory that is linked to the database and returns the exemplars of the proof(s) back to the user as answers to the query. The focus of this paper is more on the language task, however, we discuss the interaction that must occur between linguistic analysis and reasoning for an end-to-end natural language interface to databases. We illustrate the process using examples drawn from an HIV treatment domain, where the underlying databases are records of temporally bound treatments of individual patients.

[1]  Manfred Pinkal,et al.  On Semantic Underspecification , 1999 .

[2]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[3]  C. Fox Computational Semantics , .

[4]  Michael R. Lowry,et al.  Deductive Composition of Astronomical Software from Subroutine Libraries , 1994, CADE.

[5]  Daniel G. Bobrow,et al.  PARC's Bridge and Question Answering System , 2007 .

[6]  Jong-Hyeok Lee,et al.  Lightweight Natural Language Database Interfaces , 2004, NLDB.

[7]  Dick Crouch,et al.  Packed Rewriting for Mapping Semantics to KR , 2005 .

[8]  Peter Thanisch,et al.  Natural Language Interfaces to Databases , 1994 .

[9]  Tracy Holloway King,et al.  Integrating Finite-state Technology with Deep LFG Grammars , 2004 .

[10]  Richard Power,et al.  Composing Questions through Conceptual Authoring , 2007, CL.

[11]  Michael R. Lowry,et al.  AMPHION: Automatic Programming for Scientific Subroutine Libraries , 1994, ISMIS.

[12]  Berthold Crysmann,et al.  Question answering from structured knowledge sources , 2007, J. Appl. Log..

[13]  Daniel G. Bobrow,et al.  Accessing Structured Health Information through English Queries and Automatic Deduction , 2011, AAAI Spring Symposium: AI and Health Communication.

[14]  Douglas E. Appelt,et al.  Deductive Question Answering from Multiple Resources , 2004, New Directions in Question Answering.

[15]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[16]  Karen Spärck Jones,et al.  Natural language interfaces to databases , 1990, The Knowledge Engineering Review.

[17]  Daniel G. Bobrow,et al.  Deducing answers to english questions from structured data , 2011, IUI '11.

[18]  Richard Waldinger,et al.  A Guide to Snark , 2000 .

[19]  Catalina Hallett Generic Querying of Relational Databases using Natural Language Generation Techniques , 2006, INLG.

[20]  H. Uszkoreit,et al.  Querying Structured Knowledge Sources , 2005 .

[21]  Samson W. Tu,et al.  The Chronus II temporal database mediator , 2002, AMIA.

[22]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[23]  R. Shafer Rationale and uses of a public HIV drug-resistance database. , 2006, The Journal of infectious diseases.