Bulk-loading Dynamic Metric Access Methods

The main contribution of this paper is a bulk-loading algorithm for multi-way dynamic metric access methods based on the covering radius of a representative, like the Slim-tree. The proposed algorithm is sample-based, and it builds a height-balanced tree in a top-down fashion, using the metric domain’s distance function and a bound limit to group and determine the number of elements in each partition of the dataset at each step of the algorithm. Experiments performed to drill its performance shows that our bulk-loading method is up to 6 times faster to build a tree than the sequential insertion method regarding construction time, and that it improves the search performance too.

[1]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[2]  Agma J. M. Traina,et al.  The Metric Histogram: A New and Efficient Approach for Content-based Image Retrieval , 2002, VDB.

[3]  Dennis Shasha,et al.  The performance of current B-tree algorithms , 1993, TODS.

[4]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informed m , 1996 .

[5]  Christos Faloutsos,et al.  Fast Indexing and Visualization of Metric Data Sets using Slim-Trees , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.

[7]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[8]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[9]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[10]  Pavel Zezula,et al.  A cost model for similarity queries in metric spaces , 1998, PODS '98.

[11]  Bernhard Seeger,et al.  An Evaluation of Generic Bulk Loading Techniques , 2001, VLDB.

[12]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[13]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[14]  Christos Faloutsos,et al.  The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient , 2007, The VLDB Journal.

[15]  Jianwen Su,et al.  On bulk loading TPR-Tree , 2004, IEEE International Conference on Mobile Data Management, 2004. Proceedings. 2004.

[16]  Lars Arge,et al.  The Buffer Tree: A Technique for Designing Batched External Data Structures , 2003, Algorithmica.

[17]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[18]  Marco Patella,et al.  Bulk Loading the M-tree , 2001 .

[19]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[20]  Ambuj K. Singh,et al.  Modeling high-dimensional index structures using sampling , 2001, SIGMOD '01.