M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

A new access method, called M-tree, is proposed to organize and search large data sets from a generic “metric space”, i.e. where object proximity is only defined by a distance function satisfying the positivity, symmetry, and triangle inequality postulates. We detail algorithms for insertion of objects and split management, which keep the M-tree always balanced - several heuristic split alternatives are considered and experimentally evaluated. Algorithms for similarity (range and k-nearest neighbors) queries are also described. Results from extensive experimentation with a prototype system are reported, considering as the performance criteria the number of page I/O’s and the number of distance computations. The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.

[1]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[2]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[5]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[6]  Tzi-cker Chiueh,et al.  Content-Based Image Indexing , 1994, VLDB.

[7]  C. Faloutsos Eecient Similarity Search in Sequence Databases , 1993 .

[8]  WoldErling,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996 .

[9]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[10]  Yannis Manolopoulos,et al.  Dynamic Inverted Quadtree: A Structure for Pictorial Databases , 1995, Inf. Syst..

[11]  BozkayaTolga,et al.  Distance-based indexing for high-dimensional metric spaces , 1997 .

[12]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[13]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[14]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[15]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[16]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[17]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[18]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[19]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[20]  Jeffrey F. Naughton,et al.  Avi Pfeffer: Generalized Search Trees for Database Systems , 1995, VLDB 1995.