KeyRanker: Automatic RDF Key Ranking for Data Linking

Automatic approaches to key discovery on RDF datasets generate sets of discriminative properties that can be used to configure data linking systems relying on link specifications. These keys often come in large numbers, generated independently for two datasets to be linked, lacking an assessment of their usefulness for the linking task. We propose a novel generic algorithm for selecting keys, valid in two datasets, and ranking them with respect to their individual likelihood to generate identity links. In addition, we explore the combined use of several complementary keys improving their individual performance. We evaluate our approach on diverse synthetic and real-world benchmark data, showing its robustness with respect to different linking tools and domains.

[1]  Robert Isele,et al.  Silk - Generating RDF Links while Publishing or Consuming Linked Data , 2010, SEMWEB.

[2]  Nathalie Pernelle,et al.  SAKey: Scalable Almost Key Discovery in RDF Data , 2014, SEMWEB.

[3]  Peter Christen,et al.  Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface , 2008, KDD.

[4]  Konstantin Todorov,et al.  Automatic Key Selection for Data Linking , 2016, EKAW.

[5]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[6]  Enrico Motta,et al.  Integration of Semantically Annotated Data by the KnoFuss Architecture , 2008, EKAW.

[7]  Bernardo Cuenca Grau,et al.  LogMap: Logic-Based and Scalable Ontology Matching , 2011, SEMWEB.

[8]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[9]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[10]  Nathalie Pernelle,et al.  Key Discovery for Numerical Data: Application to Oenological Practices , 2016, ICCS.

[11]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[12]  Nathalie Pernelle,et al.  VICKEY: Mining Conditional Keys on Knowledge Bases , 2017, SEMWEB.

[13]  Jens Lehmann,et al.  RAVEN - active learning of link specifications , 2011, OM.

[14]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[15]  Nathalie Pernelle,et al.  An automatic key discovery approach for data linking , 2013, J. Web Semant..

[16]  Jérôme David,et al.  Data interlinking through robust linkkey extraction , 2014, ECAI.

[17]  Axel-Cyrille Ngonga Ngomo,et al.  ROCKER: A Refinement Operator for Key Discovery , 2015, WWW.

[18]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[19]  Jérôme David,et al.  Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking , 2012, EKAW.