An evaluation of models for runtime approximation in link discovery

Time-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime of the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we study non-linear runtime estimation functions for runtime estimation. In particular, we study exponential and mixed models for the estimation of the runtimes of planners. To this end, we evaluate three different models for runtime on six datasets using 500 link specifications. We show that exponential and mixed models achieve better fits when trained but are only to be preferred in some cases. Our evaluation also shows that the use of better runtime approximation models has a positive impact on the overall execution of link specifications.

[1]  Jorge J. Moré,et al.  The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[2]  Christoph Koch,et al.  Multi-objective parametric query optimization , 2014, SGMD.

[3]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[4]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[5]  Jens Lehmann,et al.  Introduction to Linked Data and Its Lifecycle on the Web , 2013, Reasoning Web.

[6]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[7]  Muhammad Saleem,et al.  Big linked cancer data: Integrating linked TCGA and PubMed , 2014, J. Web Semant..

[8]  Pierre Courrieu,et al.  Fast Computation of Moore-Penrose Inverse Matrices , 2008, ArXiv.

[9]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[10]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[11]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near duplicate detection , 2008, WWW.

[12]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[13]  Axel-Cyrille Ngonga Ngomo,et al.  HELIOS - Execution Optimization for Link Discovery , 2014, SEMWEB.

[14]  Haofen Wang,et al.  Zhishi.links results for OAEI 2011 , 2011, OM.

[15]  Axel-Cyrille Ngonga Ngomo,et al.  On Link Discovery using a Hybrid Approach , 2012, Journal on Data Semantics.

[16]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[17]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[18]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[19]  Thai Ngoc Thuy ED-JOIN: AN EFFICIENT ALGORITHM FOR SIMILARITY JOINS WITH EDIT DISTANCE CONSTRAINTS , 2009 .

[20]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[21]  Xuemin Lin,et al.  Ed-Join: an efficient algorithm for similarity joins with edit distance constraints , 2008, Proc. VLDB Endow..

[22]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[23]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.