The HeightBL Algorithm for Bulk-loading F-Onion-trees

The F-Onion-tree is a robust access method that slices the metric space into disjoint subspaces to provide quick indexing of complex data in the main memory. However, the F-Onion-tree only performs element-by-element insertions into its structure, i.e. it does not introduce a technique to build the index considering all elements of the dataset at once. In this article, we fill this gap. We propose the HeightBL algorithm for bulk-loading F-Onion-trees. Performance tests with real-world data with different volumes and dimensionalities showed that the index produced by the HeightBL algorithm is very compact. Compared with the element-by-element insertion, the size of the index reduced from 53.42% to 71.25%. The experiments also showed that the HeightBL algorithm significantly improved range and k-NN query processing performance. It required from 13.38% up to 99.94% less distance calculations and was from 8.57% up to 99.04% faster than the element-by-element insertion.

[1]  Agma J. M. Traina,et al.  Bulk-loading Dynamic Metric Access Methods , 2007, SBBD.

[2]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[3]  Bernhard Seeger,et al.  An Evaluation of Generic Bulk Loading Techniques , 2001, VLDB.

[4]  Agma J. M. Traina,et al.  Easing the Dimensionality Curse by Stretching Metric Spaces , 2009, SSDBM.

[5]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[6]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[7]  Christos Faloutsos,et al.  Fast Indexing and Visualization of Metric Data Sets using Slim-Trees , 2002, IEEE Trans. Knowl. Data Eng..

[8]  Daniel dos Santos Kaster,et al.  cx-Sim: A Metric Access Method for Similarity Queries with Additional Conditions , 2013, J. Inf. Data Manag..

[9]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[10]  Agma J. M. Traina,et al.  The MM-Tree: A Memory-Based Metric Tree Without Overlap Between Nodes , 2007, ADBIS.

[11]  Mario A. López,et al.  A greedy algorithm for bulk loading R-trees , 1998, GIS '98.

[12]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[13]  Ada Wai-Chee Fu,et al.  Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances , 2000, The VLDB Journal.

[14]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[15]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[16]  Alan P. Sexton,et al.  Bulk Loading the M-Tree to Enhance Query Performance , 2004, BNCOD.

[17]  Christos Faloutsos,et al.  Fast Feature Selection using Fractal Dimension - Ten Years Later , 2010, J. Inf. Data Manag..

[18]  Caetano Traina,et al.  Efficient bulk-loading on dynamic metric access methods , 2010, Inf. Syst..

[19]  Cristina Dutra de Aguiar Ciferri,et al.  Slicing the metric space to provide quick indexing of complex data in the main memory , 2011, Inf. Syst..

[20]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[21]  Sukho Lee,et al.  OMT: Overlap Minimizing Top-down Bulk Loading Algorithm for R-tree , 2003, CAiSE Short Paper Proceedings.

[22]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[23]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.

[24]  Christos Faloutsos,et al.  On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[25]  Marco Patella,et al.  Bulk Loading the M-tree , 2001 .

[26]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[27]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.