论文信息 - A hybrid approach for large knowledge graphs matching

A hybrid approach for large knowledge graphs matching

Matching large and heterogeneous Knowledge Graphs (KGs) has been a challenge in the Semantic Web research community. This work highlights a number of limitations with current matching methods, such as: (1) they are highly dependent on string-based similarity measures, and (2) they are primarily built to handle well-formed ontologies. These features make them unsuitable for large, (semi-) automatically constructed KGs with hundreds of classes and millions of instances. Such KGs share a remarkable number of complementary facts, often described using different vocabulary. Inspired by the role of instances in large-scale KGs, we propose a hybrid matching approach. Our method composes an instance-based matcher that casts the schema matching process as a two-way text classification task by exploiting instances of KG classes, and a string-based matcher. Our method is domain-independent and is able to handle KG classes with unbalanced population. Our evaluation on a real-world KG dataset shows that our method obtains the highest recall and F1 over all OAEI 2020 participants.

[1] Boris Vrdoljak,et al. Cromatcher: An Ontology Matching System Based on Automated Weighted Aggregation and Iterative Final Alignment , 2016, J. Web Semant..

[2] Isabelle Augenstein,et al. An unsupervised data-driven method to discover equivalent relations in large Linked Datasets , 2016, Semantic Web.

[3] Arun S. Maiya. ktrain: A Low-Code Library for Augmented Machine Learning , 2020, ArXiv.

[4] E. Cambria,et al. Deep Learning--based Text Classification , 2020, ACM Comput. Surv..

[5] Valerie V. Cross,et al. LogMap family participation in the OAEI 2022 , 2016, OM@ISWC.

[6] Heiko Paulheim,et al. The Knowledge Graph Track at OAEI , 2020, The Semantic Web.

[7] Emanuel Santos,et al. The AgreementMakerLight Ontology Matching System , 2013, OTM Conferences.

[8] Stewart Massie,et al. Ontology Alignment Based on Word Embedding and Random Forest Classification , 2018, ECML/PKDD.

[9] H. Paulheim,et al. ATBox results for OAEI 2022 , 2020, OM@ISWC.

[10] Zohra Bellahsene,et al. Opening the Black Box of Ontology Matching , 2013, ESWC.

[11] Mihaela Breaban,et al. Dealing with Data Imbalance in Text Classification , 2019, KES.

[12] Pawel Ksieniewicz. Undersampled Majority Class Ensemble for highly imbalanced binary classification , 2018, LIDTA@ECML/PKDD.

[13] H. Paulheim,et al. ALOD2Vec matcher results for OAEI 2020 , 2020, OM@ISWC.

[14] Heiko Paulheim,et al. DBkWik: A Consolidated Knowledge Graph from Thousands of Wikis , 2018, 2018 IEEE International Conference on Big Knowledge (ICBK).

[15] Haim Levkowitz,et al. Introduction to information retrieval (IR) , 2008 .

[16] Francisco Herrera,et al. Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[17] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[18] F. Hopfgartner,et al. A gold standard dataset for large knowledge graphs matching , 2020, OM@ISWC.

[19] Alexander Y. Liu. The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets , 2004 .

[20] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[21] Heiko Paulheim,et al. MELT - Matching EvaLuation Toolkit , 2019, SEMANTiCS.

[22] Andreas Thor,et al. Instance-based matching of hierarchical ontologies , 2007, BTW.

[23] Jens Lehmann,et al. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[24] H. Paulheim,et al. Wiktionary matcher results for OAEI 2020 , 2020, OM@ISWC.

[25] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[26] Lorena Otero-Cerdeira,et al. Ontology matching: A literature review , 2015, Expert Syst. Appl..