Bootstrapping Object Coreferencing on the Semantic Web

An object on the Semantic Web is likely to be denoted with several URIs by different parties. Object coreferencing is a process to identify “equivalent” URIs of objects for achieving a better Data Web. In this paper, we propose a bootstrapping approach for object coreferencing on the Semantic Web. For an object URI, we firstly establish a kernel that consists of semantically equivalent URIs from the same-as, (inverse) functional properties and (max-)cardinalities, and then extend the kernel with respect to the textual descriptions (e.g., labels and local names) of URIs. We also propose a trustworthiness-based method to rank the coreferent URIs in the kernel as well as a similarity-based method for ranking the URIs in the extension of the kernel. We implement the proposed approach, called ObjectCoref, on a large-scale dataset that contains 76 million URIs collected by the Falcons search engine until 2008. The evaluation on precision, relative recall and response time demonstrates the feasibility of our approach. Additionally, we apply the proposed approach to investigate the popularity of the URI alias phenomenon on the current Semantic Web.

[1]  Yuzhong Qu,et al.  Searching Linked Objects with Falcons: Approach, Implementation and Evaluation , 2009, Int. J. Semantic Web Inf. Syst..

[2]  Frank van Harmelen,et al.  OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples , 2010, ESWC.

[3]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[4]  Alfio Ferrara,et al.  Automatic Identity Recognition in The Semantic Web , 2008, IRSW.

[5]  Enrico Motta,et al.  Overcoming Schema Heterogeneity between Linked Semantic Repositories to Improve Coreference Resolution , 2009, ASWC.

[6]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[7]  Mark A. Musen,et al.  What Four Million Mappings Can Tell You about Two Hundred Ontologies , 2009, SEMWEB.

[8]  Mathieu d'Aquin,et al.  Large scale integration of senses for the semantic web , 2009, WWW '09.

[9]  Axel Polleres,et al.  Some entities are more equal than others: statistical methods to consolidate Linked Data , 2010 .

[10]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[11]  Xiaoyong Du,et al.  Database Research: Achievements and Challenges , 2006, Journal of Computer Science and Technology.

[12]  Gwenn Englebienne,et al.  Learning Concept Mappings from Instance Similarity , 2008, SEMWEB.

[13]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[16]  Jeff Z. Pan,et al.  SAOR: Template Rule Optimisations for Distributed Reasoning over 1 Billion Linked Data Triples , 2010, SEMWEB.

[17]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[18]  Mariano P. Consens,et al.  Linked Movie Data Base , 2009, LDOW.

[19]  Previous version: , 2004 .

[20]  Yuzhong Qu,et al.  Constructing virtual documents for ontology matching , 2006, WWW '06.

[21]  Andreas Harth,et al.  Performing Object Consolidation on the Semantic Web Data Graph , 2007, I3.

[22]  Shan Wang,et al.  A Novel Approach to Clustering Merchandise Records , 2007, Journal of Computer Science and Technology.

[23]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[24]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[25]  Raphael Volz,et al.  Towards Ontology-based Disambiguation of Geographical Identifiers , 2007, I3.

[26]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[27]  Giovanni Tummarello,et al.  RDFSync: Efficient Remote Synchronization of RDF Models , 2007, ISWC/ASWC.

[28]  D. Fensel,et al.  Architecture of the World Wide Web , Volume One , 2004 .

[29]  Yuzhong Qu,et al.  Matching large ontologies: A divide-and-conquer approach , 2008, Data Knowl. Eng..

[30]  Hugh Glaser,et al.  Managing Co-reference on the Semantic Web , 2009, LDOW.

[31]  Marek Reformat,et al.  Identification of Pleonastic It Using the Web , 2014, J. Artif. Intell. Res..

[32]  Deborah L. McGuinness,et al.  SameAs Networks and Beyond: Analyzing Deployment Status and Implications of owl: sameAs in Linked Data , 2010, International Semantic Web Conference.

[33]  Mark B. Sandler,et al.  Automatic Interlinking of Music Datasets on the Semantic Web , 2008, LDOW.

[34]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[35]  Claudia Niederée,et al.  Entity Name System: The Back-Bone of an Open and Scalable Web of Data , 2008, 2008 IEEE International Conference on Semantic Computing.