Linking Semantic Desktop Data to the Web of Data

The goal of the Semantic Desktop is to enable better organization of the personal information on our computers, by applying semantic technologies on the desktop. However, information on our desktop is often incomplete, as it is based on our subjective view, or limited knowledge about an entity. On the other hand, the Web of Data contains information about virtually everything, generally from multiple sources. Connecting the desktop to the Web of Data would thus enrich and complement desktop information. Bringing in information from the Web of Data automatically would take the burden of searching for information off the user. In addition, connecting the two networks of data opens up the possibility of advanced personal services on the desktop. Our solution tackles the problems raised above by using a semantic search engine for the Web of Data, such as Sindice, to find and retrieve a relevant subset of entities from the web. We present a matching framework, using a combination of configurable heuristics and rules to compare data graphs, that achieves a high degree of precision in the linking decision. We evaluate our methodology with real-world data; create a gold standard from relevance judgements by experts, and we measure the performance of our system against it. We show that it is possible to automatically link desktop data with web data in an effective way.

[1]  Paolo Bouquet,et al.  OKKAM: Enabling a Web of Entities , 2007, I3.

[2]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[3]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[4]  Andreas Harth,et al.  Performing Object Consolidation on the Semantic Web Data Graph , 2007, I3.

[5]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[6]  Siegfried Handschuh,et al.  The social semantic desktop: A new paradigm towards deploying the semantic Web on the desktop , 2009 .

[7]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[8]  Stefan Decker,et al.  Hierarchical Link Analysis for Ranking Web Data , 2010, ESWC.

[9]  Hugh Glaser,et al.  URI Identity Management for Semantic Web Data Integration and Linkage , 2007, OTM Workshops.

[10]  Mark B. Sandler,et al.  Automatic Interlinking of Music Datasets on the Semantic Web , 2008, LDOW.

[11]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[12]  Nathalie Pernelle,et al.  L2R: A Logical Method for Reference Reconciliation , 2007, AAAI.

[13]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[15]  María Bárbara Álvarez Torres,et al.  On the Move to Meaningful Internet Systems 2004: OTM 2004 Workshops , 2004, Lecture Notes in Computer Science.