Data Linking for the Semantic Web

By specifying that published datasets must link to other existing datasets, the 4th linked data principle ensures a Web of data and not just a set of unconnected data islands. The authors propose in this paper the term data linking to name the problem of finding equivalent resources on the Web of linked data. In order to perform data linking, many techniques were developed, finding their roots in statistics, database, natural language processing and graph theory. The authors begin this paper by providing background information and terminological clarifications related to data linking. Then a comprehensive survey over the various techniques available for data linking is provided. These techniques are classified along the three criteria of granularity, type of evidence, and source of the evidence. Finally, the authors survey eleven recent tools performing data linking and we classify them according to the surveyed techniques.

[1]  Johanna Völker,et al.  Learning Disjointness for Debugging Mappings between Lightweight Ontologies , 2008, EKAW.

[2]  Monica M. C. Schraefel,et al.  Tabulator Redux: Browsing and Writing Linked Data , 2008, LDOW.

[3]  Frank van Harmelen,et al.  Using Google distance to weight approximate ontology matches , 2007, WWW '07.

[4]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[5]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[6]  Luciano Serafini,et al.  Supporting Natural Language Processing with Background Knowledge: Coreference Resolution Case , 2010, International Semantic Web Conference.

[7]  Nathalie Pernelle,et al.  Combining a Logical and a Numerical Method for Data Reconciliation , 2009, J. Data Semant..

[8]  Yuzhong Qu,et al.  A self-training approach for resolving object coreference on the semantic web , 2011, WWW.

[9]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[10]  Florence Amardeilh Semantic Annotation and Ontology Population , 2009 .

[11]  Claudia Niederée,et al.  Probabilistic Entity Linkage for Heterogeneous Information Spaces , 2008, CAiSE.

[12]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[13]  Enrico Motta,et al.  Merging and Ranking Answers in the Semantic Web: The Wisdom of Crowds , 2009, ASWC.

[14]  Karl Aberer,et al.  idMesh: graph-based disambiguation of linked data , 2009, WWW '09.

[15]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[17]  Divesh Srivastava,et al.  Record linkage with uniqueness constraints and erroneous values , 2010, Proc. VLDB Endow..

[18]  Enrico Motta,et al.  Integration of Semantically Annotated Data by the KnoFuss Architecture , 2008, EKAW.

[19]  Ib Holm Sørensen A Specification Language , 1981, Program Specification.

[20]  Mathieu d'Aquin,et al.  Large scale integration of senses for the semantic web , 2009, WWW '09.

[21]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[22]  Eduardo Mena,et al.  Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 , 2008, OM.

[23]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[24]  Min Wang,et al.  A declarative framework for semantic link discovery over relational data , 2009, WWW '09.

[25]  Hugh Glaser,et al.  RKBExplorer.com: A Knowledge Driven Infrastructure for Linked Data Providers , 2008, ESWC.

[26]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[27]  Heiner Stuckenschmidt,et al.  Leveraging Terminological Structure for Object Reconciliation , 2010, ESWC.

[28]  Renée J. Miller,et al.  Leveraging data and structure in ontology integration , 2007, SIGMOD '07.

[29]  Antoine Isaac SKOS (Simple Knowledge Organization System) , 2011 .

[30]  Nicola Guarino,et al.  Identity and Subsumption , 2002 .

[31]  Mark B. Sandler,et al.  Automatic Interlinking of Music Datasets on the Semantic Web , 2008, LDOW.

[32]  Maria Lapata The Semantics of Relationships: An Interdisciplinary Perspective , 2003 .

[33]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[34]  Christian Bizer,et al.  The R2R Framework: Publishing and Discovering Mappings on the Web , 2010, COLD.

[35]  Enrico Motta,et al.  Data linking: capturing and utilising implicit schema-level relations , 2010, LDOW.

[36]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[37]  Enrico Motta,et al.  Refining Instance Coreferencing Results Using Belief Propagation , 2008, ASWC.

[38]  Silvana Castano,et al.  Matching Ontologies in Open Networked Systems: Techniques and Applications , 2006, J. Data Semant..

[39]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[40]  Patricia Ordóñez de Pablos,et al.  Advancing Information Management through Semantic Web Concepts and Ontologies , 2012 .

[41]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[42]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[43]  Arjen P. de Vries,et al.  SERIMI results for OAEI 2011 , 2011, OM.

[44]  Deborah L. McGuinness,et al.  Towards Identity in Linked Data , 2010, OWLED.

[45]  Prakash P. Shenoy,et al.  Valuation-based systems: a framework for managing uncertainty in expert systems , 1992 .

[46]  Claudia Niederée,et al.  On-the-fly entity-aware query processing in the presence of linkage , 2010, Proc. VLDB Endow..

[47]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[48]  Paolo Bouquet,et al.  An Entity Name System (ENS) for the Semantic Web , 2008, ESWC.

[49]  Nilesh N. Dalvi,et al.  Large-Scale Collective Entity Matching , 2011, Proc. VLDB Endow..

[50]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[51]  Haofen Wang,et al.  Zhishi.links results for OAEI 2011 , 2011, OM.

[52]  Silvana Castano,et al.  Mapping Validation by Probabilistic Reasoning , 2008, ESWC.

[53]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[54]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[55]  Jianbo Shi,et al.  Balanced Graph Matching , 2006, NIPS.

[56]  Hugh Glaser,et al.  URI Disambiguation in the Context of Linked Data , 2008, LDOW.

[57]  Nathalie Pernelle,et al.  L2R: A Logical Method for Reference Reconciliation , 2007, AAAI.

[58]  Stefanos D. Kollias,et al.  A String Metric for Ontology Alignment , 2005, SEMWEB.

[59]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[60]  Allen Ginsberg Ontological Indeterminacy and the Semantic Web , 2008, Int. J. Semantic Web Inf. Syst..

[61]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[62]  Marietjie Schutte Semantic Web Engineering in the Knowledge Society , 2009 .

[63]  Mansur R. Kabuka,et al.  Ontology matching with semantic verification , 2009, J. Web Semant..

[64]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[65]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[66]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[67]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[68]  Divesh Srivastava,et al.  Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.

[69]  Beatrice Gralton,et al.  Washington DC - USA , 2008 .