Automatically generating data linkages using class-based discriminative properties

Abstract A challenge for Linked Data is to link instances from different data sources that denote the same real-world object. Millions of high-quality owl:sameAs linkages have been generated, but potential ones are still considerable. Traditional similarity-based methods to this data linkage problem do not scale well since they exhaustively compare every pair of instances. In this paper, we propose an automatic approach to data linkage generation for Linked Data. Specifically, a highly-accurate training set is automatically generated based on equivalence reasoning and common prefix blocking. The contexts of the instances in the training set, after extracting, are pairwise matched in order to learn discriminative property pairs supporting linkage discovery. For a particular class pair and a pay-level-domain pair, the discriminability of each property pair is measured, and a few property pairs with high discriminability are aggregated in order to be reused in the future to link instances between the same classes and domains. The experimental results show that our approach achieves good accuracy against some complex methods in two OAEI tests and the BTC2011 dataset.

[1]  Mansur R. Kabuka,et al.  Ontology matching with semantic verification , 2009, J. Web Semant..

[2]  Craig A. Knoblock,et al.  Learning Blocking Schemes for Record Linkage , 2006, AAAI.

[3]  Yuzhong Qu,et al.  A self-training approach for resolving object coreference on the semantic web , 2011, WWW.

[4]  Andreas Harth,et al.  Performing Object Consolidation on the Semantic Web Data Graph , 2007, I3.

[5]  Haofen Wang,et al.  Zhishi.links results for OAEI 2011 , 2011, OM.

[6]  Maguelonne Teisseire,et al.  Data & Knowledge Engineering , 2015 .

[7]  Giovanni Tummarello,et al.  RDFSync: Efficient Remote Synchronization of RDF Models , 2007, ISWC/ASWC.

[8]  Deborah L. McGuinness,et al.  SameAs Networks and Beyond: Analyzing Deployment Status and Implications of owl: sameAs in Linked Data , 2010, International Semantic Web Conference.

[9]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative , 2007 .

[10]  Nathalie Pernelle,et al.  L2R: A Logical Method for Reference Reconciliation , 2007, AAAI.

[11]  Jan Hidders,et al.  SERIMI: Class-based Disambiguation for Effective Instance Matching over Heterogeneous Web Data , 2012, WebDB.

[12]  Harry Halpin,et al.  Architecture of the World Wide Web , 2013 .

[13]  Georgios Paliouras,et al.  Knowledge-Driven Multimedia Information Extraction and Ontology Evolution - Bridging the Semantic Gap , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[14]  Raghav Kaushik,et al.  On active learning of record matching packages , 2010, SIGMOD Conference.

[15]  Yuzhong Qu,et al.  Searching Linked Objects with Falcons: Approach, Implementation and Evaluation , 2009, Int. J. Semantic Web Inf. Syst..

[16]  Ashwin Machanavajjhala,et al.  Entity Resolution: Theory, Practice & Open Challenges , 2012, Proc. VLDB Endow..

[17]  Yuzhong Qu,et al.  Matching large ontologies: A divide-and-conquer approach , 2008, Data Knowl. Eng..

[18]  Hugh Glaser,et al.  Managing Co-reference on the Semantic Web , 2009, LDOW.

[19]  Frank van Harmelen,et al.  OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples , 2010, ESWC.

[20]  Haishan Liu,et al.  Towards Semantic Data Mining , 2010 .

[21]  Lifang Gu,et al.  Adaptive Filtering for Efficient Record Linkage , 2004, SDM.

[22]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[23]  Roi Blanco,et al.  Evaluating ad-hoc object retrieval , 2010, IWEST@ISWC.

[24]  Enrico Motta,et al.  Refining Instance Coreferencing Results Using Belief Propagation , 2008, ASWC.

[25]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[26]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative 2007 , 2006, OM.

[27]  Gwenn Englebienne,et al.  Learning Concept Mappings from Instance Similarity , 2008, SEMWEB.

[28]  Cosmin Stroe,et al.  Using AgreementMaker to align ontologies for OAEI 2010 , 2010, OM.

[29]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[30]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[31]  François Scharffe,et al.  Final results of the ontology alignment evaluation initiative 2011 , 2011 .

[32]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[33]  Heiner Stuckenschmidt,et al.  Leveraging Terminological Structure for Object Reconciliation , 2010, ESWC.

[34]  Angela Maduko,et al.  Using AgreementMaker to align Ontologies for OAEI 2009: Overview, Results, and Outlook , 2009, OM.

[35]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[36]  Xiaowei Wang,et al.  Distributed Human Computation Framework for Linked Data Co-reference Resolution , 2011, ESWC.

[37]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[38]  Ryutaro Ichise,et al.  Interlinking Linked Data Sources Using a Domain-Independent System , 2012, JIST.

[39]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[40]  Jiao Tao Adding Integrity Constraints to the Semantic Web for Instance Data Evaluation , 2010, International Semantic Web Conference.

[41]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[42]  Yuzhong Qu,et al.  Constructing virtual documents for ontology matching , 2006, WWW '06.

[43]  Elena Paslaru Bontas Simperl,et al.  CrowdMap: Crowdsourcing Ontology Alignment with Microtasks , 2012, SEMWEB.

[44]  Laura Hollink,et al.  Domain-Aware Ontology Matching , 2012, SEMWEB.

[45]  Jürgen Umbrich,et al.  Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora , 2012, J. Web Semant..