Ontology Alignment Based on Word Embedding and Random Forest Classification

Ontology alignment is crucial for integrating heterogeneous data sources and forms an important component of the semantic web. Accordingly, several ontology alignment techniques have been proposed and used for discovering correspondences between the concepts (or entities) of different ontologies. Most alignment techniques depend on string-based similarities which are unable to handle the vocabulary mismatch problem. Also, determining which similarity measures to use and how to effectively combine them in alignment systems are challenges that have persisted in this area. In this work, we introduce a random forest classifier approach for ontology alignment which relies on word embedding for determining a variety of semantic similarity features between concepts. Specifically, we combine string-based and semantic similarity measures to form feature vectors that are used by the classifier model to determine when concepts align. By harnessing background knowledge and relying on minimal information from the ontologies, our approach can handle knowledge-light ontological resources. It also eliminates the need for learning the aggregation weights of a composition of similarity measures. Experiments using Ontology Alignment Evaluation Initiative (OAEI) dataset and real-world ontologies highlight the utility of our approach and show that it can outperform state-of-the-art alignment systems. Code related to this paper is available at: https://bitbucket.org/paravariar/rafcom.

[1]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Kurt Sandkuhl,et al.  A Survey of Exploiting WordNet in Ontology Matching , 2008, IFIP AI.

[5]  Marcos Martínez Romero,et al.  A Genetic Algorithms-Based Approach for Optimizing Similarity Aggregation in Ontology Matching , 2013, IWANN.

[6]  Benhard Sitohang,et al.  Review of ontology matching with background knowledge , 2016, 2016 International Conference on Data and Software Engineering (ICoDSE).

[7]  Boris Vrdoljak,et al.  Cromatcher: An Ontology Matching System Based on Automated Weighted Aggregation and Iterative Final Alignment , 2016, J. Web Semant..

[8]  Zohra Bellahsene,et al.  YAM++ : A Multi-strategy Based Approach for Ontology Matching Task , 2012, EKAW.

[9]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[10]  Amit P. Sheth,et al.  Ontology Alignment for Linked Open Data , 2010, SEMWEB.

[11]  Lorena Otero-Cerdeira,et al.  Ontology matching: A literature review , 2015, Expert Syst. Appl..

[12]  Cosmin Stroe,et al.  AgreementMaker: Efficient Matching for Large Real-World Schemas and Ontologies , 2009, Proc. VLDB Endow..

[13]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Wei Hu,et al.  Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding , 2017, SEMWEB.

[15]  Stefanos D. Kollias,et al.  A String Metric for Ontology Alignment , 2005, SEMWEB.

[16]  Jérôme David,et al.  The Alignment API 4.0 , 2011, Semantic Web.

[17]  Zohra Bellahsene,et al.  Opening the Black Box of Ontology Matching , 2013, ESWC.

[18]  Jun Zhao,et al.  Ontology Matching with Word Embeddings , 2014, CCL.

[19]  Pascal Hitzler,et al.  String Similarity Metrics for Ontology Alignment , 2013, SEMWEB.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Charles Elkan,et al.  The Field Matching Problem: Algorithms and Applications , 1996, KDD.