Information extraction

here may be more text data in electronic form than ever before, but much of it is ignored. No human can read, understand, and synthesize megabytes of text on an everyday basis. Missed information— and lost opportunities—has spurred researchers to explore various information management strategies to establish order in the text wilderness. The most common strategies are information retrieval (IR) and information filtering [4]. A relatively new development—information extraction (IE)—is the subject of this article. We can view IR systems as combine harvesters that bring back useful material from vast fields of raw material. With large amounts of potentially useful information in hand, an IE system can then transform the raw material, refining and reducing it to a germ of the original text (see Figure 1). Suppose financial analysts are investigating production of semiconductor devices (see Figure 2). They might want to know several things:

[1]  Scott A. Waterman,et al.  The Diderot information extraction system , 1992 .

[2]  Shoshana Loeb,et al.  Information filtering , 1992, CACM.

[3]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[4]  Fabio Ciravegna,et al.  Knowledge Extraction From Texts by Sintesi , 1992, COLING.

[5]  Ellen Riloff,et al.  Automatically Acquiring Conceptual Patterns without an Annotated Corpus , 1995, VLC@ACL.

[6]  Douglas E. Appelt,et al.  SRI International: description of the FASTUS system used for MUC-4 , 1992, MUC.

[7]  Yorick Wilks,et al.  Making Preferences More Active , 1978, Artif. Intell..

[8]  Richard Granger,et al.  FOUL-UP: A Program that Figures Out Meanings of Words from Context , 1977, IJCAI.

[9]  Yorick Wilks,et al.  University of Sheffield: description of the LaSIE system as used for MUC-6 , 1995, MUC.

[10]  Hatte Blejer,et al.  The MURASAKI Project: Multilingual Natural Language Understanding , 1993, HLT.

[11]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[12]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[13]  Bonnie J. Dorr,et al.  Role of Word Sense Disalnbiguation in Lexical Acquisition: Predicting Semantics from Syntactic Cues , 1996, COLING.

[14]  R. N. Indah Language and Speech , 1958, Nature.

[15]  Elizabeth D. Liddy,et al.  Interpretation of Proper Nouns for Information Retrieval , 1993, HLT.

[16]  Adam Kilgarriff,et al.  Dictionary word sense distinctions: An enquiry into their nature , 1992, Comput. Humanit..

[17]  L. F. Rau,et al.  Extracting company names from text , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[18]  Yorick Wilks,et al.  Book Reviews: Electric Words: Dictionaries, Computers, and Meanings , 1996, CL.

[19]  Claire Cardie,et al.  University of Massachusetts: Description of the CIRCUS System as Used for MUC-4 , 1992, MUC.

[20]  Udo Hahn On Text Coherence Parsing , 1992, COLING.

[21]  Gian Piero Zarri,et al.  Automatic Representation of the Semantic Relationships Corresponding to a French Surface Expression , 1983, ANLP.

[22]  Nancy Chinchor,et al.  The Multilingual Entity Task (MET) Overview , 1996, TIPSTER.

[23]  Naomi Sager,et al.  Natural Language Information Processing: A Computer Grammar of English and Its Applications , 1980 .

[24]  Ellen Riloff,et al.  Classifying Texts Using Relevancy Signatures , 1992, AAAI.

[25]  Richard M. Schwartz,et al.  Towards Understanding Text with a Very Large Vocabulary , 1990, HLT.

[26]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[27]  Beth Sundheim,et al.  A Performance Evaluation of Text-Analysis Technologies , 1991, AI Mag..

[28]  James R. Cowie,et al.  Automatic Analysis of Descriptive Texts , 1983, ANLP.

[29]  E. Riloff,et al.  Automated dictionary construction for information extraction from text , 1993, Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications.

[30]  Yorick Wilks,et al.  Evaluation of an Algorithm for the Recognition and Classification of Proper Names , 1996, COLING.

[31]  Douglas E. Appelt,et al.  SRI: Description of the JV-FASTUS System Used for MUC-5 , 1993, MUC.

[32]  Ralph Grishman,et al.  TIPSTER Text Phase II Architecture Design Version 2.1p 19 June 1996 , 1996, TIPSTER.

[33]  Rémi Zajac,et al.  The Temple Translator's Workstation Project , 1996, TIPSTER.

[34]  James Pustejovsky,et al.  On The Semantic Interpretation of Nominals , 1988, COLING.

[35]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[36]  William C. Ogden,et al.  Oleada: User-Centered TIPSTER Technology for Language Instruction , 1996, TIPSTER.

[37]  Steven L. Lytinen,et al.  ATRANS Automatic Processing of Money Transfer Messages , 1986, AAAI.

[38]  Beth Sundheim,et al.  Survey of the Message Understanding Conferences , 1993, HLT.

[39]  Gerald DeJong,et al.  Prediction and Substantiation: A New Approach to Natural Language Processing , 1979, Cogn. Sci..

[40]  James J. Masanz,et al.  LANGUAGE PROCESSING , 1998 .