A hybrid approach for large knowledge graphs matching

Matching large and heterogeneous Knowledge Graphs (KGs) has been a challenge in the Semantic Web research community. This work highlights a number of limitations with current matching methods, such as: (1) they are highly dependent on string-based similarity measures, and (2) they are primarily built to handle well-formed ontologies. These features make them unsuitable for large, (semi-) automatically constructed KGs with hundreds of classes and millions of instances. Such KGs share a remarkable number of complementary facts, often described using different vocabulary. Inspired by the role of instances in large-scale KGs, we propose a hybrid matching approach. Our method composes an instance-based matcher that casts the schema matching process as a two-way text classification task by exploiting instances of KG classes, and a string-based matcher. Our method is domain-independent and is able to handle KG classes with unbalanced population. Our evaluation on a real-world KG dataset shows that our method obtains the highest recall and F1 over all OAEI 2020 participants.

[1]  Boris Vrdoljak,et al.  Cromatcher: An Ontology Matching System Based on Automated Weighted Aggregation and Iterative Final Alignment , 2016, J. Web Semant..

[2]  Isabelle Augenstein,et al.  An unsupervised data-driven method to discover equivalent relations in large Linked Datasets , 2016, Semantic Web.

[3]  Arun S. Maiya ktrain: A Low-Code Library for Augmented Machine Learning , 2020, ArXiv.

[4]  E. Cambria,et al.  Deep Learning--based Text Classification , 2020, ACM Comput. Surv..

[5]  Valerie V. Cross,et al.  LogMap family participation in the OAEI 2022 , 2016, OM@ISWC.

[6]  Heiko Paulheim,et al.  The Knowledge Graph Track at OAEI , 2020, The Semantic Web.

[7]  Emanuel Santos,et al.  The AgreementMakerLight Ontology Matching System , 2013, OTM Conferences.

[8]  Stewart Massie,et al.  Ontology Alignment Based on Word Embedding and Random Forest Classification , 2018, ECML/PKDD.

[9]  H. Paulheim,et al.  ATBox results for OAEI 2022 , 2020, OM@ISWC.

[10]  Zohra Bellahsene,et al.  Opening the Black Box of Ontology Matching , 2013, ESWC.

[11]  Mihaela Breaban,et al.  Dealing with Data Imbalance in Text Classification , 2019, KES.

[12]  Pawel Ksieniewicz Undersampled Majority Class Ensemble for highly imbalanced binary classification , 2018, LIDTA@ECML/PKDD.

[13]  H. Paulheim,et al.  ALOD2Vec matcher results for OAEI 2020 , 2020, OM@ISWC.

[14]  Heiko Paulheim,et al.  DBkWik: A Consolidated Knowledge Graph from Thousands of Wikis , 2018, 2018 IEEE International Conference on Big Knowledge (ICBK).

[15]  Haim Levkowitz,et al.  Introduction to information retrieval (IR) , 2008 .

[16]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[17]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[18]  F. Hopfgartner,et al.  A gold standard dataset for large knowledge graphs matching , 2020, OM@ISWC.

[19]  Alexander Y. Liu The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets , 2004 .

[20]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[21]  Heiko Paulheim,et al.  MELT - Matching EvaLuation Toolkit , 2019, SEMANTiCS.

[22]  Andreas Thor,et al.  Instance-based matching of hierarchical ontologies , 2007, BTW.

[23]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[24]  H. Paulheim,et al.  Wiktionary matcher results for OAEI 2020 , 2020, OM@ISWC.

[25]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[26]  Lorena Otero-Cerdeira,et al.  Ontology matching: A literature review , 2015, Expert Syst. Appl..