MX-tree: A Double Hierarchical Metric Index with Overlap Reduction

Large multimedia repositories often call for a highly efficient index supported by external memories, in order to fast retrieve the desired information. The M-tree, one of the metric trees, is a well-tested and dynamic index structure for similarity search in metric spaces where various distance measures can be applied. Nevertheless, its performance is undermined dramatically by the number of paths it has to traverse, which consequently increases CPU and I/O costs both. In this paper, an analysis has been performed to demonstrate the gravity of this issue. As a result, we propose a novel index structure called the MX-tree. It introduces the super node, which is inspired by the X-tree in the spatial search area, and the MX-tree fully extends the super node to metric spaces. Besides, a new node split method is presented in the MX-tree to meet the need of the low cost of index construction. This proposed method uses only O(n 2) runtime to split the overfull node without tuning any parameter while the search performance of the whole index is still guaranteed compared to the node split policy with O(n 2) in the M-tree. In addition, an internal index is proposed in the MX-tree to seamlessly handle the CPU costs in the extended leaf nodes due to the introduction of the super node. Compared to other former improvements of the M-tree, the MX-tree retains all the merits of the M-tree without any post-processing steps or losing the applicability. To survey the proposed index, we conduct extensive experiments, and experimental evaluations illustrate the efficiency of the MX-tree with regard to both CPU and I/O costs.

[1]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[2]  Peter C. Lockemann,et al.  Advances in Database Technology — EDBT 2000 , 2000, Lecture Notes in Computer Science.

[3]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[4]  R. Prim Shortest connection networks and some generalizations , 1957 .

[5]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[6]  Christos Faloutsos,et al.  The A dynamic index for multidimensional ob-jects , 1987, Very Large Data Bases Conference.

[7]  Beng Chin Ooi,et al.  Speeding up search in peer-to-peer networks with a multi-way tree structure , 2006, SIGMOD Conference.

[8]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[9]  Marco Patella,et al.  The M2-tree: Processing Complex Multi-Feature Queries with Just One Index , 2000, DELOS.

[10]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[11]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[12]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[13]  Tomás Skopal,et al.  Pivoting M-tree: A Metric Access Method for Efficient Similarity Search , 2004, DATESO.

[14]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[15]  Marco Patella,et al.  Bulk Loading the M-tree , 2001 .

[16]  Ge Yu,et al.  M+-tree : A New Dynamical Multidimensional Index for Metric Spaces , 2003, ADC.

[17]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[18]  Marcos R. Vieira,et al.  DBM-Tree: A Dynamic Metric Access Method Sensitive to Local Density Data , 2010, J. Inf. Data Manag..

[19]  Ge Yu,et al.  BM+-Tree: A Hyperplane-Based Index Method for High-Dimensional Metric Spaces , 2005, DASFAA.

[20]  Christos Faloutsos,et al.  How to improve the pruning ability of dynamic metric access methods , 2002, CIKM '02.

[21]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[22]  Christos Faloutsos,et al.  The R+ - tree : A Dynamic Index for Multi - dimensional Data , 1987, VLDB 1987.