Systems of concepts and their extraction from text

A method that could be used to populate, or more accurately to seed, terminology collections, and subsequently to seed models of specific domains, latterly called ontologies, is proposed, demonstrated and evaluated based on analysis of text collections, and with reference to recent work in international standards for terminology. The activity of populating ontologies is referred to elsewhere as ontology learning. Ontologies are considered by some as vital to the development of the Semantic Web and its Grid counterpart, and to the development of the emerging, yet elusive, "Knowledge Grids". Results of this work could be used to support activities of terminologists, document managers, developers of intelligent systems, and other language researchers. The research investigates the population of knowledge bases with systems of concepts extracted from texts in arbitrary domains. Such population is normally undertaken manually by domain experts. The method relies on identifying evidence of key domain concepts, expressed through terms used in place of these concepts, in the definition of these concepts and to express relationships between concepts. The work presented may contribute to the Semantic Web and related initiatives by helping to overcome the well-documented and unsolved AI problem of producing an initial model of an arbitrary specialist domain from background resources without significant hand-crafting effort and involvement of a domain expert: the so-called "Knowledge Acquisition Bottleneck". This bottleneck is usually overcome through extensive interactions with domain experts, involving a number of expert interviews. The research explores issues of terminology extraction from domain texts, the need for and use of knowledge representation, and the means by which terminology extraction and knowledge representation can be combined with international standards for terminology to produce such an initial model of an arbitrary specialist domain. The result of applying the presented method, the initial domain model, can be validated by domain experts, reducing the need for expert involvement in the creation of this model.