Uncertainty-Sensitive Reasoning for Inferring sameAs Facts in Linked Data

Discovering whether or not two URIs described in Linked Data -- in the same or different RDF datasets -- refer to the same real-world entity is crucial for building applications that exploit the cross-referencing of open data. A major challenge in data interlinking is to design tools that effectively deal with incomplete and noisy data, and exploit uncertain knowledge. In this paper, we model data interlinking as a reasoning problem with uncertainty. We introduce a probabilistic framework for modelling and reasoning over uncertain RDF facts and rules that is based on the semantics of probabilistic Datalog. We have designed an algorithm, ProbFR, based on this framework. Experiments on real-world datasets have shown the usefulness and effectiveness of our approach for data linkage and disambiguation.

[1]  Nathalie Pernelle,et al.  SAKey: Scalable Almost Key Discovery in RDF Data , 2014, SEMWEB.

[2]  Charles L. Forgy,et al.  Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem , 1982, Artif. Intell..

[3]  Nathalie Pernelle,et al.  Combining a Logical and a Numerical Method for Data Reconciliation , 2009, J. Data Semant..

[4]  Jürgen Umbrich,et al.  Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora , 2012, J. Web Semant..

[5]  Jérôme David,et al.  Data interlinking through robust linkkey extraction , 2014, ECAI.

[6]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[7]  Jérôme David,et al.  Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking , 2012, EKAW.

[8]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[9]  Manuel Atencia,et al.  Inferring Same-As Facts from Linked Data: An Iterative Import-by-Query Approach , 2015, AAAI.

[10]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[11]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[12]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[13]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14]  Axel-Cyrille Ngonga Ngomo,et al.  Unsupervised learning of link specifications: deterministic vs. non-deterministic , 2013, OM.

[15]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[16]  Lise Getoor,et al.  Probabilistic Similarity Logic , 2010, UAI.

[17]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[18]  Nathalie Pernelle,et al.  L2R: A Logical Method for Reference Reconciliation , 2007, AAAI.

[19]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[21]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[22]  Jakub Simko,et al.  Data linking for the Semantic Web , 2015 .

[23]  Pierre Senellart,et al.  Provenance Circuits for Trees and Treelike Instances , 2015, ICALP.

[24]  Norbert Fuhr,et al.  Probabilistic datalog: Implementing logical information retrieval for advanced applications , 2000, J. Am. Soc. Inf. Sci..

[25]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[26]  Robert Isele,et al.  Active learning of expressive linkage rules using genetic programming , 2013, J. Web Semant..