Statistical Matching of Two Ontologies

Standardizing ontologms ~s a challenging task Ontologms have been created based on different backgrounds, different purposes and different people However, standardizing them is useful not only for applications, such as Machine Translation and Information Retrmval, but also to Improve the. ontologms themselves During the process of standardization, people can find bugs or gaps in ontologms So standardlzatmn bnngs benefits compared to just using them separately There is a committee for standardlzmg ontologaes at ANSI, the "ANSI Ad-Hoc Group for Ontology Standards" (Hovy 1996) Although there have been a few attempts to merge and compare ontologaes, th,s work ~s still at a prehmmary stage of research (Ogmo et al 1997) attempts manual mergang of EDR (EDR 1996) (Mlyoshl et al 1996) and WordNet (Wordnet) (Miller 1995), (Utlyama and Hashlda 1997) used statistical methods to merge EDR and WordNet (Pangloss) is also working on standardizing ontologms It is certain that manual methods have great difficulty in matching the entire ontologles It would require three thousand years for a person to check all possible node pairings, if the two ontologms have 40 000 nodes each.and eachjudgement takes a minute So automatic methods are needed to find matches automatically or at least to narrow down the candidates for matching In this paper, we investigate a simple statistical method for matching two ontologms The method can appl~ to any ontologms which are formulated from ls-a relationships In our experiments, we used EDR and \VotdNet Tins ~ork is sumlar to the work in (UtL~ama and Hashlda 1997) They defined the task as the MWM (Maximum V~elgnt klatch) of bipartite graphs, an approach which is bas~cally common to most ontology matching schemes The information they used is partially fuzzy, i e for calculating the distance between two nodes, they used the information from each node and its neighborhood, not distinguishing between mformatmn from parent and child nodes However, since the structure of the ontologms (the relation between parent and children) is slgmficant, it might be better to utilize such structural reformation In our experiments, we will focus on this issue, rather than trying to achieve a higher performance The importance of parent, child and grandchild information will be examined We will conduct several experiments with or without some of the mformatlon It is also important to dlsco~er what welghtmg balance gives good matches