The MM-Tree: A Memory-Based Metric Tree Without Overlap Between Nodes

Advanced database systems offer similarity queries on complex data. Searching by similarity on complex data is accelerated through the use of metric access methods (MAM). These access methods organize data in order to reduce the number of comparison between elements when answering queries. MAM can be categorized in two types: disk-based and memory-based. The disk-based structures limit the partitioning of space forcing nodes to have multiple elements according to disk page sizes. However, memory-based trees allows more flexibility, producing trees faster to build and to perform queries. Although recent developments target disk-based methods on tree structures, several applications benefits from a faster way to build indexes on main memory. This paper presents a memory-based metric tree, the MM-tree, which successively partitions the space into non-overlapping regions. We present experiments comparing MM-tree with existing high performance MAM, including the disk-based Slim-tree. The experiments reveal that MM-tree requires up to one fifth of the number of distance calculations to be constructed when compared with Slim-tree, performs range queries requiring 64% less distance calculations and KNN queries requiring 74% less distance calculations.

[1]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[3]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[4]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[5]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[6]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[7]  Christos Faloutsos,et al.  How to improve the pruning ability of dynamic metric access methods , 2002, CIKM '02.

[8]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[9]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[10]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[11]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[12]  Christos Faloutsos,et al.  Similarity search without tears: the OMNI-family of all-purpose access methods , 2001, Proceedings 17th International Conference on Data Engineering.

[13]  Pavel Zezula,et al.  Indexing Metric Spaces with M-Tree , 1997, SEBD.

[14]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[15]  Christos Faloutsos,et al.  Fast Indexing and Visualization of Metric Data Sets using Slim-Trees , 2002, IEEE Trans. Knowl. Data Eng..