How to improve the pruning ability of dynamic metric access methods

Complex data retrieval is accelerated using index structures, which organize the data in order to prune comparisons between data during queries. In metric spaces, comparison operations can be specially expensive, so the pruning ability of indexing methods turns out to be specially meaningful. This paper shows how to measure the pruning power of metric access methods, and defines a new measurement, called "prunability," which indicates how well a pruning technique carries out the task of cutting down distance calculations at each tree level. It also presents a new dynamic access method, aiming to minimize the number of distance calculations required to answer similarity queries. We show that this novel structure is up to 3 times faster and requires less than 25% distance calculations to answer similarity queries, as compared to existing methods. This gain in performance is achieved by taking advantage of a set of global representatives. Although our technique uses multiple representatives, the index structure still remains dynamic and balanced.

[1]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[2]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[3]  Christos Faloutsos,et al.  Similarity search without tears: the OMNI-family of all-purpose access methods , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Christos Faloutsos,et al.  Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[5]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[6]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[7]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[8]  Christos Faloutsos,et al.  Spatial join selectivity using power laws , 2000, SIGMOD '00.

[9]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[10]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[11]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[12]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[13]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[14]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[15]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.