To Reveal the Performance Secrets of the Newest NN Searching Algorithm

Nearest Neighbor (NN) search has been widely used in spatial databases and multimedia databases. Incremental NN (INN) search algorithm is regarded as the optimal NN search because of the minimum number of node accesses and it can be used no matter whether the number of objects to be retrieved is fixed or not in advance. This paper presents an analytical model for estimating performance of the INN search algorithm. For the first time, our model takes m (the number of neighbor objects reported finally), n (the cardinality of database) and d (the dimensionality) as parameters, focusing on the number of node accesses (not only the number of accessed leaf nodes) and the length of the priority queue. Using our model, dimensionality curse is mathematically revealed for an arbitrary number of NN objects retrieved. In our model, (1) for the first time, the two key factors of d m (the distance from the m-th NN object to the query point) and σh (the side length of each node) are estimated using their upper bounds and their lower bounds, which is helpful to effectiveness of our model, especially in high-dimensional spaces; (2) for the first time, the possible difference of fanouts among the leaf nodes, the root node and the others is taken into account. The theoretical analysis is verified by experiments.

[1]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[2]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[3]  Hyoung-Joo Kim,et al.  An Algorithm for Incremental Nearest Neighbor Search in High-Dimensional Data Spaces , 2001, Human.Society@Internet.

[4]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[5]  Yaokai Feng,et al.  An Estimating Model of Node Accesses for INN Search Algorithm in Multidimensional Spaces , 2007, Eng. Lett..

[6]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[7]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[8]  Alan J. Broder Strategies for efficient incremental nearest neighbor search , 1990, Pattern Recognit..

[9]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[10]  Anne H. H. Ngu,et al.  Combining multi-visual features for efficient indexing in a large image database , 2001, The VLDB Journal.

[11]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[12]  Changzhou Wang,et al.  Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches , 2001, The VLDB Journal.

[13]  G. Clark,et al.  Reference , 2008 .

[14]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[15]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[16]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[17]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[18]  Kihong Kim,et al.  Optimizing multidimensional index trees for main memory access , 2001, SIGMOD '01.