BECKEY: Understanding, comparing and discovering keys of different semantics in knowledge bases

Abstract Integrating data coming from different knowledge bases has been one of the most important tasks in the Semantic Web the last years. Keys have been considered to be very useful in the data linking task. A set of properties is considered a key if it uniquely identifies every resource in the data. To cope with the incompleteness of the data, three different key semantics have been proposed so far. We propose BECKEY, a semantic agnostic approach that discovers keys for all three semantics, succeeding to scale on large datasets. Our approach is able to discover keys under the presence of erroneous data or duplicates (i.e., almost keys). A formalisation of the three semantics along with the relations among them is provided. An extended experimental comparison of the three key semantics has taken place. The results allow a better understanding of the three semantics, providing insights on when each semantic is more appropriate for the task of data linking.

[1]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[2]  Konstantin Todorov,et al.  KeyRanker: Automatic RDF Key Ranking for Data Linking , 2017, K-CAP.

[3]  Axel-Cyrille Ngonga Ngomo,et al.  ROCKER: A Refinement Operator for Key Discovery , 2015, WWW.

[4]  Nathalie Pernelle,et al.  An automatic key discovery approach for data linking , 2013, J. Web Semant..

[5]  Jérôme David,et al.  Data interlinking through robust linkkey extraction , 2014, ECAI.

[6]  Nathalie Pernelle,et al.  SAKey: Scalable Almost Key Discovery in RDF Data , 2014, SEMWEB.

[7]  Boris Motik,et al.  OWL 2 Web Ontology Language: Direct Semantics , 2009 .

[8]  Diego Reforgiato Recupero,et al.  A Semantic Web Based Core Engine to Efficiently Perform Sentiment Analysis , 2014, ESWC.

[9]  Jakub Simko,et al.  Data linking for the Semantic Web , 2015 .

[10]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[11]  Christian Bizer,et al.  Learning expressive linkage rules from sparse data , 2020, Semantic Web.

[12]  François Scharffe,et al.  Data Linking for the Semantic Web , 2011, Int. J. Semantic Web Inf. Syst..

[13]  Jérôme David,et al.  Uncertainty-Sensitive Reasoning for Inferring sameAs Facts in Linked Data , 2016, ECAI.

[14]  Yuzhong Qu,et al.  A self-training approach for resolving object coreference on the semantic web , 2011, WWW.

[15]  Jérôme David,et al.  Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking , 2012, EKAW.

[16]  Manuel Atencia,et al.  Inferring Same-As Facts from Linked Data: An Iterative Import-by-Query Approach , 2015, AAAI.

[17]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[18]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[19]  Nathalie Pernelle,et al.  Defining Key Semantics for the RDF Datasets: Experiments and Evaluations , 2014, ICCS.

[20]  Ali Selamat,et al.  Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples , 2015, Inf. Sci..

[21]  Chao Tian,et al.  Keys for Graphs , 2015, Proc. VLDB Endow..

[22]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[23]  Jürgen Umbrich,et al.  Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora , 2012, J. Web Semant..

[24]  A-Xing Zhu,et al.  Multidimensional and quantitative interlinking approach for Linked Geospatial Data , 2017, Int. J. Digit. Earth.

[25]  George Papastefanatos,et al.  Parallel meta-blocking for scaling entity resolution over big heterogeneous data , 2017, Inf. Syst..

[26]  Paul Brown,et al.  GORDIAN: efficient and scalable discovery of composite keys , 2006, VLDB.

[27]  Nathalie Pernelle,et al.  VICKEY: Mining Conditional Keys on Knowledge Bases , 2017, SEMWEB.