A Query-sensitive Cost Model for Similarity Queries with M-tree

We introduce a cost model for the M-tree access method [Ciaccia et al., 1997] which provides estimates of CPU (distance computations) and I/O costs for the execution of similarity queries as a function of each single query. This model is said to be query-sensitive, since it takes into account, by relying on the novel notion of “witness”, the “position” of the query point inside the metric space indexed by the M-tree. We describe the basic concepts underlying the model along with different methods which can be used for its implementation; finally, we experimentally validate the model over both real and synthetic datasets.

[1]  Tzi-cker Chiueh,et al.  Content-Based Image Indexing , 1994, VLDB.

[2]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[3]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[4]  Karl Aberer,et al.  Efficient querying on genomic databases by using metric space indexing techniques , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[5]  Dario Maio,et al.  A structural approach to fingerprint classification , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[6]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[8]  Pavel Zezula,et al.  A cost model for similarity queries in metric spaces , 1998, PODS '98.

[9]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[10]  Marco Patella,et al.  Bulk Loading the M-tree , 2001 .

[11]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[12]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[13]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[14]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[15]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[16]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[18]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[19]  J. Simonoff Multivariate Density Estimation , 1996 .

[20]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.