Unsupervised Link Discovery through Knowledge Base Repair

The Linked Data Web has developed into a compendium of partly very large datasets. Devising efficient approaches to compute links between these datasets is thus central to achieve the vision behind the Data Web. Unsupervised approaches to achieve this goal have emerged over the last few years. Yet, so far, none of these unsupervised approaches makes use of the replication of resources across several knowledge bases to improve the accuracy it achieves while linking. In this paper, we present Colibri, an iterative unsupervised approach for link discovery. Colibri allows the discovery of links between n datasets (n ≥ 2) while improving the quality of the instance data in these datasets. To this end, Colibri combines error detection and correction with unsupervised link discovery. We evaluate our approach on five benchmark datasets with respect to the F-score it achieves. Our results suggest that Colibri can significantly improve the results of unsupervised machine-learning approaches for link discovery while correctly detecting erroneous resources.

[1]  Robert Isele,et al.  Active Learning of Expressive Linkage Rules for the Web of Data , 2012, ICWE.

[2]  Nicola Fanizzi,et al.  Non-parametric Statistical Learning Methods for Inductive Classifiers in Semantic Knowledge Bases , 2008, 2008 IEEE International Conference on Semantic Computing.

[3]  Xueyan Jiang,et al.  Link Prediction in Multi-relational Graphs using Additive Models , 2012, SeRSy.

[4]  Q PuKen,et al.  Discovering linkage points over web data , 2013, VLDB 2013.

[5]  Cristina Pérez-Solà,et al.  Improving Relational Classification Using Link Prediction Techniques , 2013, ECML/PKDD.

[6]  Hans-Peter Kriegel,et al.  Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.

[7]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[8]  Jens Lehmann,et al.  Introduction to Linked Data and Its Lifecycle on the Web , 2013, Reasoning Web.

[9]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[10]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[11]  Heiner Stuckenschmidt,et al.  Benchmarking Matching Applications on the Semantic Web , 2011, ESWC.

[12]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[13]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[14]  Erhard Rahm,et al.  Composition Methods for Link Discovery , 2013, BTW.

[15]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[16]  Robert Isele,et al.  Learning linkage rules using genetic programming , 2011, OM.

[17]  Axel-Cyrille Ngonga Ngomo,et al.  Unsupervised learning of link specifications: deterministic vs. non-deterministic , 2013, OM.

[18]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[19]  Gerhard Weikum,et al.  LINDA: distributed web-of-data-scale entity matching , 2012, CIKM.

[20]  Renée J. Miller,et al.  Discovering Linkage Points over Web Data , 2013, Proc. VLDB Endow..

[21]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[22]  Axel-Cyrille Ngonga Ngomo,et al.  On Link Discovery using a Hybrid Approach , 2012, Journal on Data Semantics.

[23]  Axel-Cyrille Ngonga Ngomo,et al.  COALA - Correlation-Aware Active Learning of Link Specifications , 2013, ESWC.

[24]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[25]  Jérôme Euzenat,et al.  Algebras of Ontology Alignment Relations , 2008, SEMWEB.

[26]  Stephan Bloehdorn,et al.  Kernel Methods for Mining Instance Data in Ontologies , 2007, ISWC/ASWC.