HELIOS - Execution Optimization for Link Discovery

Links between knowledge bases build the backbone of the Linked Data Web. In previous works, the combination of the results of time-efficient algorithms through set-theoretical operators has been shown to be very time-efficient for Link Discovery. However, the further optimization of such link specifications has not been paid much attention to. We address the issue of further optimizing the runtime of link specifications by presenting Helios, a runtime optimizer for Link Discovery. Helios comprises both a rewriter and an execution planner for link specifications. The rewriter is a sequence of fixed-point iterators for algebraic rules. The planner relies on time-efficient evaluation functions to generate execution plans for link specifications. We evaluate Helios on 17 specifications created by human experts and 2180 specifications generated automatically. Our evaluation shows that Helios is up to 300 times faster than a canonical planner. Moreover, Helios' improvements are statistically significant.

[1]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[2]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative 2007 , 2006, OM.

[3]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  Eric Peukert,et al.  Rewrite techniques for performance optimization of schema matching processes , 2010, EDBT '10.

[5]  Jan Nößner,et al.  CODI: Combinatorial Optimization for Data Integration: results for OAEI 2011 , 2010, OM.

[6]  J. Paredaens,et al.  Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS 1998 : Seattle, Washington, June 1-3, 1998 , 1998, SIGMOD 1998.

[7]  Heiner Stuckenschmidt,et al.  Ontology Alignment Evaluation Initiative: Six Years of Experience , 2011, J. Data Semant..

[8]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[9]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near duplicate detection , 2008, WWW.

[10]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[11]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[12]  Axel-Cyrille Ngonga Ngomo,et al.  Link Discovery with Guaranteed Reduction Ratio in Affine Spaces with Minkowski Measures , 2012, SEMWEB.

[13]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[14]  Guido Moerkotte,et al.  Histograms reloaded: the merits of bucket diversity , 2010, SIGMOD Conference.

[15]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[16]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[17]  Michael C. Ferris,et al.  A Genetic Algorithm for Database Query Optimization , 1991, ICGA.

[18]  Jeff Heflin,et al.  The Semantic Web – ISWC 2012 , 2012, Lecture Notes in Computer Science.

[19]  Axel-Cyrille Ngonga Ngomo,et al.  On Link Discovery using a Hybrid Approach , 2012, Journal on Data Semantics.

[20]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[21]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[22]  Haofen Wang,et al.  Zhishi.links results for OAEI 2011 , 2011, OM.