NaLIX: A generic natural language search environment for XML data

We describe the construction of a generic natural language query interface to an XML database. Our interface can accept a large class of English sentences as a query, which can be quite complex and include aggregation, nesting, and value joins, among other things. This query is translated, potentially after reformulation, into an XQuery expression. The translation is based on mapping grammatical proximity of natural language parsed tokens in the parse tree of the query sentence to proximity of corresponding elements in the XML data to be retrieved. Iterative search in the form of followup queries is also supported. Our experimental assessment, through a user study, demonstrates that this type of natural language interface is good enough to be usable now, with no restrictions on the application domain.

[1]  Marc Smith,et al.  Conversation trees and threaded chats , 2000, CSCW '00.

[2]  Sebastian van Delden,et al.  Retrieving NASA problem reports: a case study in natural language information retrieval , 2004, Data Knowl. Eng..

[3]  Epaminondas Kapetanios,et al.  Query Construction through Meaningful Suggestions of Terms , 2002, FQAS.

[4]  Arthur C. Graesser,et al.  Questions and information systems , 1992 .

[5]  Steven M. Drucker,et al.  Alternative interfaces for chat , 1999, UIST '99.

[6]  J. Widom,et al.  Interactive Query and Search in Semistructured Databases , 1998, WebDB.

[7]  Igor Mel’čuk,et al.  Studies in Dependency Syntax , 1979 .

[8]  Karen Spärck Jones,et al.  Readings in natural language processing , 1986 .

[9]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[10]  Cong Yu,et al.  Enabling Schema-Free XQuery with meaningful query focus , 2008, The VLDB Journal.

[11]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[12]  David Stallard,et al.  A Terminological Simplification Transformation for Natural Language Question-Answering Systems , 1986, HLT.

[13]  Oren Etzioni,et al.  Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability , 2004, COLING.

[14]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[15]  Yannis E. Ioannidis,et al.  Conversational querying , 2006, Inf. Syst..

[16]  Frank Meng,et al.  Database Query Formation from Natural Language using Semantic Modeling and Statistical Keyword Meani , 1999 .

[17]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[18]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[19]  Dietmar F. Rösner,et al.  NAUDA: a cooperative natural language interface to relational databases , 1993, SIGMOD '93.

[20]  Judith S. Donath,et al.  Chat circles , 1999, CHI '99.

[21]  Soumen Chakrabarti,et al.  Keyword Search in Databases , 2007 .

[22]  H. V. Jagadish,et al.  NaLIX: an interactive natural language interface for querying XML , 2005, SIGMOD '05.

[23]  Karen Spärck Jones,et al.  Natural language interfaces to databases , 1990, The Knowledge Engineering Review.

[24]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[25]  Laura M. Haas,et al.  PESTO : An Integrated Query/Browser for Object Databases , 1996, VLDB.

[26]  Claire Cardie,et al.  Identifying Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution , 2002, COLING.

[27]  John Hale,et al.  A Statistical Approach to Anaphora Resolution , 1998, VLC@COLING/ACL.

[28]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[29]  A. Cawsey Book Reviews: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context , 1995, CL.

[30]  H. V. Jagadish,et al.  Enabling Domain-Awareness for a Generic Natural Language Interface , 2007, AAAI.

[31]  Scott Bennett,et al.  Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies , 1995, ACL.

[32]  Kalina Bontcheva,et al.  GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.

[33]  A. Graesser,et al.  The Psychology of Questions , 1985 .

[34]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[35]  Daniel Hardt,et al.  Centering in Dynamic Semantics , 1996, COLING.

[36]  Rodolfo Delmonte,et al.  Binding Pronominals with an LFG Parser , 1991, IWPT.

[37]  Wendy A. Kellogg,et al.  Socially translucent systems: social proxies, persistent conversation, and the design of “babble” , 1999, CHI '99.

[38]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[39]  Faisal M. Khan,et al.  Mining Chat-room Conversations for Social and Semantic Interactions , 2002 .

[40]  H. V. Jagadish,et al.  DaNaLIX: a domain-adaptive natural language interface for querying XML , 2007, SIGMOD '07.

[41]  Peter Fankhauser,et al.  Editors , 2016 .

[42]  Johanna D. Moore Participating in explanatory dialogues , 1994 .

[43]  Shelly Farnham,et al.  Structured online interactions: improving the decision-making of small discussion groups , 2000, CSCW '00.

[44]  Giovanni Guida,et al.  IR-NLI : An Expert Natural Language Interface To Online Data Bases , 1983, ANLP.

[45]  Jong Wook Kim,et al.  Topic segmentation of message hierarchies for indexing and navigation support , 2005, WWW '05.

[46]  Judithe Sheard,et al.  Web-based discussion forums: the staff perspective , 2003, ITiCSE.

[47]  Jennifer Chu-Carroll,et al.  A Hybrid Approach to Natural Language Web Search , 2002, EMNLP.

[48]  C. Michael Sperberg-McQueen,et al.  World Wide Web Consortium , 2009, Encyclopedia of Database Systems.

[49]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[50]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[51]  David R. Karger,et al.  Magnet: supporting navigation in semistructured data environments , 2005, SIGMOD '05.

[52]  Raymond J. Mooney,et al.  Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing , 2001, ECML.

[53]  Daniel Hardt Dynamic Centering , 2004, Conference On Reference Resolution And Its Applications.

[54]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[55]  Marcia J. Bates,et al.  The design of browsing and berrypicking techniques for the online search interface , 1989 .

[56]  Antonio Cisternino,et al.  PiQASso: Pisa Question Answering System , 2001, TREC.

[57]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[58]  Igor Mel’čuk,et al.  Studies in Dependency Syntax , 1979 .

[59]  Robert Burgin,et al.  Performance Standards and Evaluations in IR Test Collections: Vector-Space and Other Retrieval Models , 1997, Inf. Process. Manag..

[60]  Rodolfo Delmonte,et al.  Semantic parsing with LFG and conceptual representations , 1990, Comput. Humanit..

[61]  Tefko Saracevic,et al.  The Stratified Model of Information Retrieval Interaction: Extension and Applications , 1997 .

[62]  Louis M. Gomez,et al.  SuperBook: an automatic tool for information exploration—hypertext? , 1987, Hypertext.

[63]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[64]  Yolanda Nokuri Hegngi Changing Roles, Changing Technologies: The Design, Development, Implementation, and Evaluation of a Media Technology and Diversity On-Line Course , 1998 .

[65]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[66]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[67]  John S. White,et al.  Review of Questions and information systems by Thomas W. Lauer, Eileen Peacock, and Arthur C. Graesser. Lawrence Erlbaum Associates 1992. , 1993 .

[68]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[69]  Jaime G. Carbonell,et al.  Discourse Pragmatics and Ellipsis Resolution in Task-Oriented Natural Language Interfaces , 1983, ACL.

[70]  H. V. Jagadish,et al.  Constructing a Generic Natural Language Interface for an XML Database , 2006, EDBT.

[71]  Qiang Yang,et al.  Thread detection in dynamic text message streams , 2006, SIGIR.

[72]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[73]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[74]  Agathoniki Trigoni,et al.  Interactive Query Formulation in Semistructured Databases , 2002, FQAS.

[75]  Kalina Bontcheva,et al.  A Light-weight Approach to Coreference Resolution for Named Entities in Text , 2002 .

[76]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[77]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[78]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[79]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.