Innovations in Text Interpretation

Abstract The field of natural language processing is developing a new concentration on interpreting extended texts, with applications in information retrieval, text categorization, and data extraction. The research that addresses these problems represents the first real task-driven focus since machine translation research in the 1960s. Text interpretation applications have already produced good results in accuracy and throughput. This new focus on task-driven text interpretation has been the driving force for a number of advances in the field, because earlier systems fell so far short of the coverage required to interpret bodies of text. The innovations behind this scale-up include work in lexicon development and representation, weak methods of corpus analysis and text pre-processing, and flexible control architectures for parsing. Together, these methods provide coverage and accuracy in interpretation by extending the knowledge that a system can use and controlling how this knowledge is applied. This paper explains the context in which this research is conducted, along with the general progress of the field and some of the details of how our own system realizes these advances.

[1]  Joseph D. Becker The Phrasal Lexicon , 1975, TINLAP.

[2]  Robert Wilensky,et al.  Phran - A Knowledge-Base Natural Language Understander , 1980, ACL.

[3]  Kenneth Ward Church,et al.  Parsing, Word Associations and Typical Predicate-Argument Relations , 1989, HLT.

[4]  Yorick Wilks,et al.  Making Preferences More Active , 1978, Artif. Intell..

[5]  Paul S. Jacobs,et al.  Acquiring Lexical Knowledge from Text: A Case Study , 1988, AAAI.

[6]  Richard Edward Cullingford,et al.  Script application: computer understanding of newspaper stories. , 1977 .

[7]  Philip J. Hayes,et al.  A News Story Categorization System , 1988, ANLP.

[8]  Susan McRoy,et al.  Using Multiple Knowledge Sources for Word Sense Discrimination , 1992, Comput. Linguistics.

[9]  Ralph Grishman,et al.  Preference Semantics for Message Understanding , 1989, HLT.

[10]  Lisa F. Rau,et al.  NL ∩ IR: Natural language for information retrieval , 1989, Int. J. Intell. Syst..

[11]  Lisa F. Rau,et al.  Lexico-Semantic Pattern Matching as a Companion to Parsing in Text Understanding , 1991, HLT.

[12]  W. Bruce Croft,et al.  An Approach to Natural Language Processing for Document Retrieval. , 1987, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[13]  Daniel M. Russell,et al.  Planning and understanding: A computational approach to human reasoning: R. Wilensky, (Addison-Wesley, Reading, MA, 1983); 157 pages, $25.00 , 1984 .

[14]  Naomi Sager,et al.  Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base , 1978, Adv. Comput..

[15]  P. Zunde,et al.  Indexing Consistency and Quality. , 1969 .

[16]  Gerald DeJong Prediction and substantiation: A new approach to natural language processing , 1979 .

[17]  Lynette Hirschman,et al.  Comparing MUCK-II and MUC-3: Assessing the Difficulty of Different Tasks , 1991, MUC.

[18]  Paul S. Jacobs,et al.  Knowledge-Intensive Natural Language Generation , 1987, Artif. Intell..

[19]  Paul S. Jacobs,et al.  TRUMP: A transportable language understanding program , 1992, Int. J. Intell. Syst..

[20]  R. A. Sharman,et al.  Generating a grammar for statistical training , 1990, HLT.

[21]  Steven L. Lytinen Dynamically Combining Syntax and Semantics in Natural Language Processing , 1986, AAAI.

[22]  William A. Woods,et al.  Cascaded ATN Grammars , 1980, Am. J. Comput. Linguistics.

[23]  Uri Zernik Lexicon Acquisition: Learning from Corpus by Capitalizing on Lexical Categories , 1989, IJCAI.

[24]  Oliviero Stock,et al.  Parsing with Flexibility, Dynamic Strategies, and Idioms in Mind , 1989, CL.

[25]  Ralph Grishman,et al.  Information Extraction and Semantic Constraints , 1990, COLING.

[26]  Paul S. Jacobs,et al.  Joining Statistics with NLP for Text Categorization , 1992, ANLP.

[27]  Lisa F. Rau,et al.  Generic Text Processing: A Progress Report , 1990, HLT.

[28]  Bonnie L. Webber,et al.  Knowledge Representation for Syntactic/Semantic Processing , 1980, AAAI.

[29]  Lisa F. Rau,et al.  Integrating Top-Down And Bottom-Up Strategies In A Text Processing System , 1988, ANLP.

[30]  Ralph Grishman,et al.  A Production Rule System for Message Summarization , 1997, AAAI.

[31]  Lisa F. Rau,et al.  Creating segmented databases from free text for text retrieval , 1991, SIGIR '91.

[32]  Steven L. Lytinen Semantics-First Natural Language Processing , 1991, AAAI.

[33]  Paul S. Jacobs Integrating Language and Meaning in Structured Inheritance Networks , 1991, Principles of Semantic Networks.

[34]  Lisa F. Rau,et al.  Ace: Associating Language with Meaning , 1984, ECAI.

[35]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[36]  Paul S. Jacobs,et al.  To Parse or Not to Parse: Relation-Driven Text Skimming , 1990, COLING.

[37]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[38]  Ronald J. Brachman,et al.  An overview of the KL-ONE Knowledge Representation System , 1985 .

[39]  Paul S. Jacobs,et al.  FLUSH: A Flexible Lexicon Design , 1987, ACL.

[40]  Branimir Boguraev,et al.  Large Lexicons for Natural Language Processing: Utilising the Grammar Coding System of LDOCE , 1987, CL.

[41]  Edward A. Fox,et al.  Building a Large Thesaurus for Information Retrieval , 1988, ANLP.

[42]  Claire Cardie,et al.  A Cognitively Plausible Approach to Understanding Complex Syntax , 1991, AAAI.

[43]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[44]  Yorick Wilks,et al.  A Preferential, Pattern-Seeking, Semantics for Natural Language Inference , 1975, Artif. Intell..

[45]  L. F. Rau,et al.  Extracting company names from text , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[46]  Timothy W. Finin,et al.  Integrating Natural Language Processing and Knowledge Based Processing , 1990, AAAI.

[47]  Michael G. Dyer,et al.  The Self-Extending Phrasal Lexicon , 1987, Comput. Linguistics.