A natural language interface for information retrieval from forms on the World Wide Web

This paper presents an approach for retrieving information from forms on the world wide web from natural language input. The structured nature of the form can be utilized to process natural language input for querying data sources on the web that provide form interfaces. Since the valid values for each field can be determined from the form itself or by a user of the form, the form can be filled out be looking for these values in the natural language user input. Since it is possible for a particular value to be valid for more than one field, the surrounding context must be used to determine the correct field for an ambiguous value. A statistical disambiguation method based on n-gram statistics is proposed. It was shown that this method works better than using single context words for disambiguation when the domain is limited.

[1]  Ellen Riloff,et al.  Information extraction as a basis for high-precision text classification , 1994, TOIS.

[2]  Wendy G. Lehnert,et al.  Wrap-Up: a Trainable Discourse Module for Information Extraction , 1994, J. Artif. Intell. Res..

[3]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[4]  David L. Waltz,et al.  An English language question answering system for a large relational database , 1978, CACM.

[5]  Gary G. Hendrix,et al.  Developing a natural language interface to complex data , 1977, TODS.

[6]  J. W. Gowens,et al.  Natural Language Interfaces to Database Systems , 1988 .

[7]  Ellen Riloff,et al.  Little words can make a big difference for text classification , 1995, SIGIR '95.

[8]  Paolo Merialdo,et al.  To Weave the Web , 1997, VLDB.

[9]  Gerald Salton,et al.  Automatic text processing , 1988 .

[10]  William A. Woods,et al.  Semantics and Quantification in Natural Language Question Answering , 1986, Adv. Comput..

[11]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[12]  Craig A. Knoblock,et al.  A hierarchical approach to wrapper induction , 1999, AGENTS '99.

[13]  Maria-Esther Vidal,et al.  Wrapper generation for Web accessible data sources , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[14]  Roberto Garigliano,et al.  Natural language processing and information extraction: qualitative analysis of financial news articles , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[15]  Mark Stevenson,et al.  Combining independent knowledge sources for word sense disambiguation , 2000 .

[16]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.