Automated knowledge derivation: Domain‐independent techniques for domain‐restricted text sources

This article provides a description of the major components of a system that builds and updates a knowledge base by extracting the knowledge from natural language text. the knowledge extraction is done in a domain‐independent manner and does not rely on particular vocabulary or grammar constructions. the only restriction is that the input text must be technical text from some specific problem domain. an important capability of the system is that it can bootstrap itself. That is, beginning with only a description of the types of object and relationships to be stored in the knowledge base, the system can start with an empty knowledge base and build the knowledge base as it processes the text. the knowledge extraction system's success in extracting knowledge from various input texts was evaluated using scoring metrics reported by Lehnert and Sundheim [AI Mag., 12(3), 81–94 (1991)]. the initial results indicate that the knowledge extraction mechanism is both effective and independent of a particular author's writing style or a particular domain. © 1995 John Wiley & Sons, Inc.

[1]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[2]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[3]  D.P. Morgan,et al.  The application of dynamic programming to connected speech recognition , 1990, IEEE ASSP Magazine.

[4]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[5]  Deborah A. Dahl,et al.  Reference resolution in PUNDIT , 1990 .

[6]  Naomi Sager,et al.  Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base , 1978, Adv. Comput..

[7]  Bernard Mérialdo,et al.  Natural Language Modeling for Phoneme-to-Text Transcription , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Inderjeet Mani,et al.  Knowledge and natural language processing , 1990, CACM.

[9]  Julia E. Hodges,et al.  Automatically building a knowledge base through natural language text analysis , 1991, Int. J. Intell. Syst..

[10]  Rajeev Agarwal,et al.  A Simple but Useful Approach to Conjunct Identification , 1992, ACL.

[11]  Julia E. Hodges,et al.  The automatic initialization of an object-oriented knowledge base , 1992, ACM-SE 30.

[12]  Kenneth Ward Church,et al.  Enhanced Good-Turing and Cat-Cal: Two New Methods for Estimating Probabilities of English Bigrams (abbreviated version) , 1989, HLT.

[13]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[14]  Edward A. Fox,et al.  Building a Large Thesaurus for Information Retrieval , 1988, ANLP.

[15]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[16]  Michael D. Lorenz,et al.  Small Animal Medical Diagnosis , 1995 .

[17]  F. J. Finaldo Deriving rules for medical expert systems using natural language parsing and discourse analysis , 1989 .

[18]  Jose Luis Cordova A Domain-Independent Approach to Knowledge Acquisition From Natural Language Text , 1992 .

[19]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.