Domain-specific ontology mapping by corpus-based semantic similarity

Mapping heterogeneous ontologies is usually performed manually by domain experts, or accomplished by computer programs via comparing the structures of the ontologies and the linguistic semantics of their concepts. In this work, we take a different approach to compare and map the concepts of heterogeneous domain-specific ontologies by using a document corpus in a domain similar to the domain of the ontologies as a bridge. Cosine similarity and Jaccard coefficient, two vector-based similarity measures commonly used in the field of information retrieval are adopted to compare semantic similarity between ontologies. Additionally, the market basket model is modified as a relatedness analysis measure for ontology mapping. We use regulations as the bridging document corpus and the consideration of the corpus hierarchical information in concept similarity comparison. Preliminary results are obtained using ontologies from the architectural, engineering and construction (AEC) industry. The proposed market basket model appears to outperform the other two similarity measures, with its prediction error reduced using corpus structural information.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Kincho H. Law,et al.  Analyzing government regulations using structural and domain information , 2005, Computer.

[3]  Marc Ehrig,et al.  State of the art on ontology alignment , 2013 .

[4]  Jos de Bruijn,et al.  D4.2.1 State-of-the-art survey on Ontology Merging and Aligning V1 , 2004 .

[5]  Alan F. Smeaton,et al.  Using WordNet in a Knowledge-Based Approach to Information Retrieval , 1995 .

[6]  Kincho H. Law,et al.  A relatedness analysis of government regulations using domain knowledge and structural organization , 2006, Information Retrieval.

[7]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[8]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[9]  References , 1971 .

[10]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[11]  Willem Robert van Hage,et al.  A Method to Combine Linguistic Ontology-Mapping Techniques , 2005, SEMWEB.

[12]  Steven R. Ray,et al.  Interoperability Standards in the Semantic Web , 2002, J. Comput. Inf. Sci. Eng..

[13]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[14]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[15]  Un Yong Nahm and Mikhail Bilenko and Raymond J. Mooney,et al.  Two Approaches to Handling Noisy Variation in Text Mining , 2002 .

[16]  J. Leon Zhao,et al.  Automatic discovery of similarity relationships through Web mining , 2003, Decis. Support Syst..

[17]  Natalia Grabar,et al.  Automatic acquisition of domain-specific morphological resources from thesauri , 2000 .

[18]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[19]  York Sure-Vetter,et al.  Ontology Mapping - An Integrated Approach , 2004, ESWS.

[20]  Philip Resnik,et al.  WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery , 1992, AAAI 1992.

[21]  Thomas Froese,et al.  International Alliance for Interoperability: IFCs , 1998 .

[22]  John L. Dettbarn,et al.  Cost Analysis of Inadequate Interoperability in the U.S. Capital Facilities Industry. , 2004 .

[23]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[24]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[25]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[26]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[27]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.