Defining Key Semantics for the RDF Datasets: Experiments and Evaluations

Many techniques were recently proposed to automate the linkage of RDF datasets. Predicate selection is the step of the linkage process that consists in selecting the smallest set of relevant predicates needed to enable instance comparison. We call keys this set of predicates that is analogous to the notion of keys in relational databases. We explain formally the different assumptions behind two existing key semantics. We then evaluate experimentally the keys by studying how discovered keys could help dataset interlinking or cleaning. We discuss the experimental results and show that the two different semantics lead to comparable results on the studied datasets.

[1]  Stefano Spaccapietra Journal on Data Semantics XII , 2009, Journal on Data Semantics XII.

[2]  Enrico Motta,et al.  Data linking: capturing and utilising implicit schema-level relations , 2010, LDOW.

[3]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[4]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[5]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[6]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[7]  Nathalie Pernelle,et al.  Combining a Logical and a Numerical Method for Data Reconciliation , 2009, J. Data Semant..

[8]  Boris Motik,et al.  OWL 2 Web Ontology Language: Direct Semantics , 2009 .

[9]  Craig A. Knoblock,et al.  Learning Blocking Schemes for Record Linkage , 2006, AAAI.

[10]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[11]  Jakub Simko,et al.  Data linking for the Semantic Web , 2015 .

[12]  Nathalie Pernelle,et al.  An automatic key discovery approach for data linking , 2013, J. Web Semant..

[13]  François Scharffe,et al.  Data Linking for the Semantic Web , 2011, Int. J. Semantic Web Inf. Syst..

[14]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Yuzhong Qu,et al.  A self-training approach for resolving object coreference on the semantic web , 2011, WWW.

[17]  Jérôme David,et al.  Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking , 2012, EKAW.

[18]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[19]  Nathalie Pernelle,et al.  L2R: A Logical Method for Reference Reconciliation , 2007, AAAI.

[20]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[21]  Peter Christen,et al.  A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.

[22]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[23]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[24]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[25]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[26]  Boris Motik,et al.  OWL 2 Web Ontology Language: structural specification and functional-style syntax , 2008 .