LANCE: Piercing to the Heart of Instance Matching Tools

One of the main challenges in the Data Web is the identification of instances that refer to the same real-world entity. Choosing the right framework for this purpose remains tedious, as current instance matching benchmarks fail to provide end users and developers with the necessary insights pertaining to how current frameworks behave when dealing with real data. In this paper, we present lance, a domain-independent instance matching benchmark generator which focuses on benchmarking instance matching systems for Linked Data. lance is the first Linked Data benchmark generator to support complex semantics-aware test cases that take into account expressive OWL constructs, in addition to the standard test cases related to structure and value transformations. lance supports the definition of matching tasks with varying degrees of difficulty and produces a weighted gold standard, which allows a more fine-grained analysis of the performance of instance matching tools. It can accept any linked dataset and its accompanying schema as input to produce a target dataset implementing test cases of varying levels of difficulty. We provide a comparative analysis with lance benchmarks to assess and identify the capabilities of state of the art instance matching systems as well as an evaluation to demonstrate the scalability of lance's test case generator.

[1]  Lise Getoor,et al.  Query-time entity resolution , 2006, KDD '06.

[2]  Heiner Stuckenschmidt,et al.  Leveraging Terminological Structure for Object Reconciliation , 2010, ESWC.

[3]  Chen Li,et al.  Supporting Efficient Record Linkage for Large Data Sets Using Mapping Techniques , 2006, World Wide Web.

[4]  Erhard Rahm,et al.  Evolution of the COMA match system , 2011, OM.

[5]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[6]  Panagiotis G. Ipeirotis,et al.  Duplicate Record Detection: A Survey , 2007 .

[7]  Hans-Peter Kriegel,et al.  Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.

[8]  Heiner Stuckenschmidt,et al.  Benchmarking Matching Applications on the Semantic Web , 2011, ESWC.

[9]  P. Sedgwick Matching , 2009, BMJ : British Medical Journal.

[10]  Vasilis Efthymiou,et al.  Entity resolution in the web of data , 2013, Entity Resolution in the Web of Data.

[11]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative , 2007 .

[12]  Axel-Cyrille Ngonga Ngomo,et al.  Pushing the Limits of Instance Matching Systems: A Semantics-Aware Benchmark for Linked Data , 2015, WWW.

[13]  François Scharffe,et al.  Final results of the ontology alignment evaluation initiative 2011 , 2011 .

[14]  Dimitris Plexousakis,et al.  OtO Matching System: A Multi-strategy Approach to Instance Matching , 2012, CAiSE.

[15]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative 2007 , 2006, OM.

[16]  Wang Chiew Tan,et al.  STBenchmark: towards a benchmark for mapping systems , 2008, Proc. VLDB Endow..

[17]  Bernardo Cuenca Grau,et al.  LogMap: Logic-Based and Scalable Ontology Matching , 2011, SEMWEB.

[18]  Stefan Conrad,et al.  A Benchmark for Testing Instance-based Ontology Matching Methods , 2010, EKAW.

[19]  Felix Naumann,et al.  A Duplicate Detection Benchmark for XML ( and Relational ) Data , 2006 .

[20]  Irini Fundulaki,et al.  Instance matching benchmarks in the era of Linked Data , 2016, J. Web Semant..

[21]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Robert Isele,et al.  Silk Server - Adding missing Links while consuming Linked Data , 2010, COLD.

[24]  Heiner Stuckenschmidt,et al.  Ontology Alignment Evaluation Initiative: Six Years of Experience , 2011, J. Data Semant..

[25]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[26]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[27]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[28]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[29]  Lise Getoor,et al.  Entity Resolution in Graphs , 2005 .

[30]  Li Ma,et al.  Towards a Complete OWL Ontology Benchmark , 2006, ESWC.

[31]  Ian Horrocks,et al.  MORe: a Modular OWL Reasoner for Ontology Classification , 2013, ORE.