Some entities are more equal than others: statistical methods to consolidate Linked Data

Abstract. We propose a method for consolidating entities in RDF data on the Web. Our approach is based on a statistical analysis of the use of predicates and their associated values to identify “quasi”-key properties. Compared to a purely symbolic based approach, we obtain promising results, retrieving more identical entities with a high precision. We also argue that our technique scales well—possibly to the size of the current Web of Data—as opposed to more expensive existing approaches.

[1]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[2]  H. B. Newcombe,et al.  Computers can be used to extract "follow-up" statistics of families from files of routine records. , 1959 .

[3]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative 2007 , 2006, OM.

[4]  Riccardo Albertoni,et al.  Semantic Similarity of Ontology Instances Tailored on the Application Context , 2006, OTM Conferences.

[5]  Min Wang,et al.  A declarative framework for semantic link discovery over relational data , 2009, WWW '09.

[6]  Silvana Castano,et al.  Instance Matching for Ontology Population , 2008, SEBD.

[7]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative , 2007 .

[8]  Craig A. Knoblock,et al.  Exploiting Secondary Sources for Automatic Object Consolidation , 2003 .

[9]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  Andreas Harth,et al.  Performing Object Consolidation on the Semantic Web Data Graph , 2007, I3.

[11]  Philip A. Bernstein,et al.  Interactive Schema Translation with Instance-Level Mappings , 2005, VLDB.

[12]  Riccardo Albertoni,et al.  Asymmetric and Context-Dependent Semantic Similarity among Ontology Instances , 2008, J. Data Semant..

[13]  Dmitri V. Kalashnikov,et al.  Exploiting relationships for object consolidation , 2005, IQIS '05.

[14]  Hugh Glaser,et al.  Managing Co-reference on the Semantic Web , 2009, LDOW.

[15]  Paolo Bouquet,et al.  OkkaM: Towards a Solution to the "Identity Crisis" on the Semantic Web , 2006, SWAP.

[16]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.