An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces

Existing models for nearest neighbor search in multidimensional spaces are not appropriate for query optimization because they either lead to erroneous estimation or involve complex equations that are expensive to evaluate in real-time. This article proposes an alternative method that captures the performance of nearest neighbor queries using approximation. For uniform data, our model involves closed formulae that are very efficient to compute and accurate for up to 10 dimensions. Further, the proposed equations can be applied on nonuniform data with the aid of histograms. We demonstrate the effectiveness of the model by using it to solve several optimization problems related to nearest neighbor search.

[1]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[2]  Christian Böhm,et al.  Dynamically Optimizing High-Dimensional Index Structures , 2000, EDBT.

[3]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[4]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[5]  John G. Cleary,et al.  Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space , 1979, TOMS.

[6]  ChungChin-Wan,et al.  Multi-dimensional selectivity estimation using compressed histogram information , 1999 .

[7]  Christian Bohm,et al.  A cost model for query processing in high dimensional data spaces , 2000 .

[8]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[9]  Deok-Hwan Kim,et al.  Multi-dimensional selectivity estimation using compressed histogram information , 1999, SIGMOD '99.

[10]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[11]  Sridhar Ramaswamy,et al.  Selectivity estimation in spatial databases , 1999, SIGMOD '99.

[12]  Christos Faloutsos,et al.  Deflating the dimensionality curse using multiple fractal dimensions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[13]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[14]  Christian Böhm,et al.  A cost model for query processing in high dimensional data spaces , 2000, TODS.

[15]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[16]  Dimitrios Gunopulos,et al.  Approximating multi-dimensional aggregate range queries over real attributes , 2000, SIGMOD 2000.

[17]  Yufei Tao,et al.  Adaptive Index Structures , 2002, VLDB.

[18]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[19]  Sunil Arya,et al.  Accounting for boundary effects in nearest neighbor searching , 1995, SCG '95.

[20]  Pavel Zezula,et al.  A cost model for similarity queries in metric spaces , 1998, PODS '98.

[21]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[22]  Christian Böhm,et al.  On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces , 2001, ICDT.

[23]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[24]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[25]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[26]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[27]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[28]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[29]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[30]  ManolopoulosYannis,et al.  Closest pair queries in spatial databases , 2000 .

[31]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[32]  Christos Faloutsos,et al.  On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[33]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[34]  Divyakant Agrawal,et al.  Constrained Nearest Neighbor Queries , 2001, Encyclopedia of GIS.

[35]  Robert F. Sproull,et al.  Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[36]  Yannis Manolopoulos,et al.  Closest pair queries in spatial databases , 2000, SIGMOD '00.