Approximation Techniques to Enable Dimensionality Reduction for Voronoi-Based Nearest Neighbor Search

Utilizing spatial index structures on secondary memory for nearest neighbor search in high-dimensional data spaces has been the subject of much research. With the potential to host larger indexes in main memory, applications demanding a high query throughput stand to benefit from index structures tailored for that environment. “Index once, query at very high frequency” scenarios on semi-static data require particularly fast responses while allowing for more extensive precalculations. One such precalculation consists of indexing the solution space for nearest neighbor queries as used by the approximate Voronoi cell-based method. A major deficiency of this promising approach is the lack of a way to incorporate effective dimensionality reduction techniques. We propose methods to overcome the difficulties faced for normalized data and present a second reduction step that improves response times through limiting the dimensionality of the Voronoi cell approximations. In addition, we evaluate the suitability of our approach for main memory indexing where speedup factors of up to five can be observed for real world data sets.

[1]  Raimund Seidel,et al.  On the number of faces in higher-dimensional Voronoi diagrams , 1987, SCG '87.

[2]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[3]  J. Sack,et al.  Handbook of computational geometry , 2000 .

[4]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[5]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[6]  V. Klee On the complexity ofd- dimensional Voronoi diagrams , 1979 .

[7]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[9]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[10]  Georges Voronoi Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Deuxième mémoire. Recherches sur les parallélloèdres primitifs. , 1908 .

[11]  Kihong Kim,et al.  Optimizing multidimensional index trees for main memory access , 2001, SIGMOD '01.

[12]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[13]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[14]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[15]  Hans-Peter Kriegel,et al.  Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space , 2000, IEEE Trans. Knowl. Data Eng..

[16]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[17]  Jonathan Goldstein,et al.  Indexing High Dimensional Rectangles for Fast Multimedia Identification , 2003 .

[18]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[19]  Richard J. Lipton,et al.  Multidimensional Searching Problems , 1976, SIAM J. Comput..

[20]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[21]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[22]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[23]  Rajeev Rastogi,et al.  Main-memory index structures with fixed-size partial keys , 2001, SIGMOD '01.

[24]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[25]  Michael J. Maher,et al.  Projecting CLP(R ) Constraints , 1993 .

[26]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[27]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[28]  Franz Aurenhammer,et al.  Handbook of Computational Geometry , 2000 .

[29]  Franz Aurenhammer,et al.  Voronoi Diagrams , 2000, Handbook of Computational Geometry.