A Tool For Mapping Concepts Between Two Ontologies

We describe ongoing work which combines the recently emerging semantic markup language DAML+OIL (for ontology specification), the text-based classification technology (for similarity information collection), and Bayesian reasoning (for similarity synthesis and final mapping selection), to provide ontology mapping between two classification hierarchies. This work supports an interactive system used to semi-automatically build a mapping from one topic hierarchy into another. This system will be used as part of the ITTALKS [1] system to allow multiple topic ontologies to be used to describe the subjects of talks and the interests of users. The ontology maps developed by our tools can then be used to recognized that a talk described by terms in one ontology might be of interest to a users who has described her interests using terms drawn from another. A more complete description of this work is available at [2]. Ontologies. The two hierarchies we used as examples are ACM topic ontology and a small ITTALKS topic ontology which organizes classes of IT related talks in a way different from ACM classification. Both ontologies, as well as the output mappings, are marked up in DAML+OIL. Each concept/class in an ontology is associated with a set of exemplars, which are URLs to the locations of text documents thought to belong to that class. Text-based classification . The Rainbow text classifier [3] is used to generate similarity scores between concepts in the two ontologies based on their associated exemplar documents. First, a model is built for each ontology, which primarily contains statistical information about the exemplars associated with each concept in that ontology. Then, the similarity score from concept in ontology B to concept in ontology A can be obtained by comparing the exemplars of against the model of ontology A. In essence, measures similarity between exemplars associated with and those with . Bayesian subsumption. may (partially) match more than one concept in A, each with a different similarity score. Also since a non-leaf node is a superclass of its children, its exemplars should include both those associated with it and those with all of its descendants in the hierarchy. Therefore, non-leaf nodes need to synthesize scores from their descendants before the final mapping can be selected. This is accomplished by a Bayesian extension of the subsumption operation of description logics. In this approach, we assume that all leaves in a hierarchy form a mutually exclusive and exhaustive set, and take the score as if is a leaf. 1