Comparison of ontology alignment algorithms across single matching task via the McNemar test

Ontology alignment is widely used to €nd the correspondences between di‚erent ontologies in diverse €elds. A‰er discovering the alignment by methods, several performance scores are available to evaluate them. Œe scores require the produced alignment by amethod and the reference alignment containing the underlying actual correspondences of the given ontologies. Œe current trend in alignment evaluation is to put forward a new score and to compare various alignments by juxtaposing their performance scores. However, it is substantially provocative to select one performance score among others for comparison. On top of that, claiming if one method has a beŠer performance than one another can not be substantiated by solely comparing the scores. In this paper, we propose the statistical procedures which enable us to theoretically favor one method over one another. Œe McNemar test is considered as a reliable and suitable means for comparing two ontology alignment methods over one matching task. Œe test applies to a 2 × 2 contingency table which can be constructed in two di‚erent ways based on the alignments, each of which has their own merits/pitfalls. Œe ways of the contingency table construction and various apposite statistics from the McNemar test are elaborated in minute detail. In the case of having more than two alignment methods for comparison, the family-wise error rate is expected to happen. Œus, the ways of preventing such an error are also discussed. A directed graph visualizes the outcome of the McNemar test in the presence of multiple alignment methods. From this graph, it is readily understood if one method is beŠer than one another or if their di‚erences are imperceptible. Our investigation on the methods participated in the anatomy track of OAEI 2016 demonstrates that AML and CroMatcher are the top two methods and DKP-AOM and Alin are the boŠom two ones. Moreover, the Levenstein and N-gram string-based distances discover the most correspondences while SMOA and Hamming distance are the ones with the least found correspondences.

[1]  Sihem Mostefai,et al.  Decision trees in automatic ontology matching , 2016, Int. J. Metadata Semant. Ontologies.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Warith Eddine Djeddi,et al.  XMAP: A novel structural approach for alignment of OWL-Full ontologies , 2010, 2010 International Conference on Machine and Web Intelligence.

[4]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[5]  H. O. Lancaster,et al.  Significance Tests in Discrete Distributions , 1961 .

[6]  Grzegorz Kondrak,et al.  N-Gram Similarity and Distance , 2005, SPIRE.

[7]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[8]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[9]  York Sure-Vetter,et al.  Ontology Mapping - An Integrated Approach , 2004, ESWS.

[10]  David H. Wolpert,et al.  Ubiquity symposium: Evolutionary computation and the processes of life: what the no free lunch theorems really mean: how to improve search algorithms , 2013, UBIQ.

[11]  Enrico Motta,et al.  DSSim-ontology Mapping with Uncertainty , 2006, Ontology Matching.

[12]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[13]  H. Finner On a Monotonicity Problem in Step-Down Multiple Test Procedures , 1993 .

[14]  Fernanda Araujo Baião,et al.  ALIN Results for OAEI 2019 , 2016, OM@ISWC.

[15]  Bernardo Cuenca Grau,et al.  LogMap: Logic-Based and Scalable Ontology Matching , 2011, SEMWEB.

[16]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[17]  Olivier Teste,et al.  LPHOM results for OAEI 2016 , 2016, OM@ISWC.

[18]  G. Hommel,et al.  Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses , 1988 .

[19]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems. OTM 2018 Conferences , 2018, Lecture Notes in Computer Science.

[20]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[21]  Songmao Zhang,et al.  FCA-Map results for OAEI 2016 , 2016, OM@ISWC.

[22]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[23]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[24]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[25]  Baowen Xu,et al.  Lily: Ontology Alignment Results for OAEI 2008 , 2008, OM.

[26]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[27]  Maozhen Li,et al.  Dealing With Uncertain Entities in Ontology Alignment Using Rough Sets , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Morten Wang Fagerland,et al.  The McNemar test for binary matched-pairs data: mid-p and asymptotic are better than exact conditional , 2013, BMC Medical Research Methodology.

[29]  Heiko Paulheim,et al.  Evaluation Measures for Ontology Matchers in Supervised Matching Scenarios , 2013, SEMWEB.

[30]  B. Holland,et al.  An Improved Sequentially Rejective Bonferroni Test Procedure , 1987 .

[31]  Peter Bauer,et al.  Multiple Hypothesenprüfung / Multiple Hypotheses Testing , 1988, Medizinische Informatik und Statistik.

[32]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[33]  Jérôme Euzenat,et al.  Semantic Precision and Recall for Ontology Alignment Evaluation , 2007, IJCAI.

[34]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[35]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[36]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[37]  Stefanos D. Kollias,et al.  A String Metric for Ontology Alignment , 2005, SEMWEB.

[38]  Nisheeth Joshi,et al.  Shiva: A Framework for Graph Based Ontology Matching , 2014, ArXiv.

[39]  A. L. Edwards Note on the “correction for continuity” in testing the significance of the difference between correlated proportions , 1948, Psychometrika.

[40]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matcher Ensembles , 2007, SUM.

[41]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[42]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.