Dynamic Planning for Link Discovery

With the growth of the number and the size of RDF datasets comes an increasing need for scalable solutions to support the linking of resources. Most Link Discovery frameworks rely on complex link specifications for this purpose. We address the scalability of the execution of link specifications by presenting the first dynamic planning approach for Link Discovery dubbed Condor. In contrast to the state of the art, Condor can re-evaluate and reshape execution plans for link specifications during their execution. Thus, it achieves significantly better runtimes than existing planning solutions while retaining an F-measure of 100%. We quantify our improvement by evaluating our approach on 7 datasets and 700 link specifications. Our results suggest that Condor is up to 2 orders of magnitude faster than the state of the art and requires less than 0.1% of the total runtime of a given specification to generate the corresponding plan.

[1]  Axel-Cyrille Ngonga Ngomo,et al.  An evaluation of models for runtime approximation in link discovery , 2016, WI.

[2]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[3]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[4]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative , 2007 .

[5]  Michael C. Ferris,et al.  A Genetic Algorithm for Database Query Optimization , 1991, ICGA.

[6]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[7]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[8]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[9]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[10]  Jens Lehmann,et al.  Wombat - A Generalization Approach for Automatic Link Discovery , 2017, ESWC.

[11]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative 2007 , 2006, OM.

[12]  Richard R. Muntz,et al.  Dynamic query re-optimization , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[13]  Guoliang Li,et al.  Trie-join , 2010, Proc. VLDB Endow..

[14]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[15]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[16]  Axel-Cyrille Ngonga Ngomo,et al.  HELIOS - Execution Optimization for Link Discovery , 2014, SEMWEB.

[17]  Haofen Wang,et al.  Zhishi.links results for OAEI 2011 , 2011, OM.

[18]  Guido Moerkotte,et al.  Histograms reloaded: the merits of bucket diversity , 2010, SIGMOD Conference.

[19]  Axel-Cyrille Ngonga Ngomo,et al.  On Link Discovery using a Hybrid Approach , 2012, Journal on Data Semantics.