论文信息 - Integrating Web resources and lexicons into a natural language query system

Integrating Web resources and lexicons into a natural language query system

The START system responds to natural language queries with answers in text, pictures, and other media. START's sentence-level natural language parsing relies on a number of mechanisms to help it process the huge, diverse resources available on the World Wide Web. Blitz, a hybrid heuristic- and corpus-based natural language preprocessor enables START to integrate a large and ever-changing lexicon of proper names, by using heuristic rules and precompiled tables of symbols to preprocess various highly regular and fixed expressions into lexical tokens. LaMeTH, a content-based system for extracting information from HTML documents, assists START by providing a uniform method of accessing information on the Web in real time. These mechanisms have considerably improved STARTS ability to analyze real-world sentences and answer queries through expansion of its lexicon and integration of Web resources.

[1] Claire Cardie,et al. UMass/Hughes: Description of the CIRCUS System Used for MUC-51 , 1993, MUC.

[2] Richard M. Schwartz,et al. Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[3] Boris Katz,et al. Annotating the World Wide Web using Natural Language , 1997, RIAO.

[4] Nina Wacholder,et al. Extracting Names from Natural-Language Text , 2000 .

[5] Douglas E. Appelt,et al. The SRI MUC-5 JV-FASTUS In-formation Extraction System , 1993 .

[6] Nina Wacholder,et al. Disambiguation of Proper Names in Text , 1997, ANLP.

[7] Boris Katz,et al. Using English for Indexing and Retrieving , 1991 .

[8] Phil Hayes,et al. NameFinder: Software that finds Names in Text , 1994, RIAO.