CRYSTAL: Inducing a Conceptual Dictionary

One of the central knowledge sources of an information extraction (IE) system IS a dictionary of linguistic patterns that can be used to identify references to relevant information in a text Automatic creation of conceptual dictionaries is important for portability and scalability of an IE system This paper describes CRYSTAL, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules.