The Lazy Traveling Salesman - Memory Management for Large-Scale Link Discovery

Links between knowledge bases build the backbone of the Linked Data Web. In previous works, several time-efficient algorithms have been developed for computing links between knowledge bases. Most of these approaches rely on comparing resource properties based on similarity or distance functions as well as combinations thereof. However, these approaches pay little attention to the fact that very large datasets cannot be held in the main memory of most computing devices. In this paper, we present a generic memory management for Link Discovery. We show that the problem at hand is a variation of the traveling salesman problem and is thus NP-complete. We thus provide efficient graph-based algorithms that allow scheduling link discovery tasks efficiently. Our evaluation on real data shows that our approach allows computing links between large amounts of resources efficiently.

[1]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[2]  Shigeyoshi Tsutsui,et al.  A Highly-Parallel TSP Solver for a GPU Computing Platform , 2010, NMA.

[3]  Heiner Stuckenschmidt,et al.  Results of the Ontology Alignment Evaluation Initiative , 2007 .

[4]  Axel-Cyrille Ngonga Ngomo,et al.  Link Discovery with Guaranteed Reduction Ratio in Affine Spaces with Minkowski Measures , 2012, SEMWEB.

[5]  Haofen Wang,et al.  Zhishi.links results for OAEI 2011 , 2011, OM.

[6]  Jan Nößner,et al.  CODI: Combinatorial Optimization for Data Integration: results for OAEI 2011 , 2010, OM.

[7]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[8]  Siti Mariyam Shamsuddin,et al.  A Survey of Web Caching and Prefetching , 2011 .

[9]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[10]  Ryutaro Ichise,et al.  ScSLINT: Time and Memory Efficient Interlinking Framework for Linked Data , 2015, International Semantic Web Conference.

[11]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.

[12]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[13]  Robert Isele,et al.  Efficient Multidimensional Blocking for Link Discovery without losing Recall , 2011, WebDB.

[14]  John J. Grefenstette,et al.  Genetic Algorithms for the Traveling Salesman Problem , 1985, ICGA.

[15]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near duplicate detection , 2008, WWW.

[16]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[17]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[18]  Axel-Cyrille Ngonga Ngomo,et al.  On Link Discovery using a Hybrid Approach , 2012, Journal on Data Semantics.

[19]  Erhard Rahm,et al.  When to Reach for the Cloud: Using Parallel Hardware for Link Discovery , 2013, ESWC.

[20]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[21]  Eric Peukert,et al.  Rewrite techniques for performance optimization of schema matching processes , 2010, EDBT '10.