Efficient Evaluation of All-Nearest-Neighbor Queries

The All Nearest Neighbor (ANN) operation is a commonly used primitive for analyzing large multi-dimensional datasets. Since computing ANN is very expensive, in previous works R*-tree based methods have been proposed to speed up this computation. These traditional index-based methods use a pruning metric called MAXMAXDIST, which allows the algorithms to prune out nodes in the index that need not be traversed during the ANN computation. In this paper we introduce a new pruning metric called the NXNDIST, and show that this metric is far more effective than the traditional MAXMAXDIST metric. In this paper, we also challenge the common practice of using R*-tree index for speeding up the ANN computation. We propose an enhanced bucket quadtree index structure, called the MBRQT, and using extensive experimental evaluation show that the MBRQT index can significantly speed up the ANN computation. In addition, we also present the MBA algorithm based on a depth-first index traversal and bi-directional node expansion strategy. Furthermore, our method can be easily extended to efficiently answer the more general All-k-Nearest-Neighbor (AkNN) queries.

[1]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[2]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[3]  Irene Gargantini,et al.  An effective way to represent quadtrees , 1982, CACM.

[4]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[5]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[6]  SeegerBernhard,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990 .

[7]  Jyh-Jong Tsay,et al.  External-memory computational geometry , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[8]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[9]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[10]  Sven Koenig,et al.  Graph learning with a nearest neighbor approach , 1996, COLT '96.

[11]  Hanan Samet,et al.  Incremental distance join algorithms for spatial databases , 1998, SIGMOD '98.

[12]  D. Eisenstein,et al.  HOP: A New Group-finding Algorithm for N-Body Simulations , 1997, astro-ph/9712200.

[13]  A. Verri,et al.  The Nearest-Neighbor Technique for particle identification , 1998 .

[14]  Yannis Theodoridis,et al.  On the Generation of Spatiotemporal Datasets , 1999 .

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  Mario A. López,et al.  The Effect of Buffering on the Performance of R-Trees , 2000, IEEE Trans. Knowl. Data Eng..

[17]  Christian S. Jensen,et al.  Indexing the positions of continuously moving objects , 2000, SIGMOD '00.

[18]  Yannis Manolopoulos,et al.  Closest pair queries in spatial databases , 2000, SIGMOD '00.

[19]  Sukho Lee,et al.  Adaptive multi-stage distance join processing , 2000, SIGMOD '00.

[20]  Hans-Peter Kriegel,et al.  Efficiently supporting multiple similarity queries for mining in metric databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[21]  Kothuri Venkata Ravi Kanth,et al.  Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data , 2002, SIGMOD '02.

[22]  Hanan Samet,et al.  Speeding up construction of PMR quadtree-based spatial indexes , 2002, The VLDB Journal.

[23]  Christian Böhm,et al.  Supporting KDD Applications by the k-Nearest Neighbor Join , 2003, DEXA.

[24]  Richard Nock,et al.  A Simple Locally Adaptive Nearest Neighbor Rule With Application To Pollution Forecasting , 2003, Int. J. Pattern Recognit. Artif. Intell..

[25]  Jimeng Sun,et al.  The TPR*-Tree: An Optimized Spatio-Temporal Access Method for Predictive Queries , 2003, VLDB.

[26]  Christian Böhm,et al.  The k-Nearest Neighbour Join: Turbo Charging the KDD Process , 2004, Knowledge and Information Systems.

[27]  Yannis Manolopoulos,et al.  Algorithms for processing K-closest-pair queries in spatial databases , 2004, Data Knowl. Eng..

[28]  Beng Chin Ooi,et al.  Gorder: An Efficient Method for KNN Join Processing , 2004, VLDB.

[29]  Yufei Tao,et al.  All-nearest-neighbors queries in spatial databases , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[30]  Yang Du,et al.  On Computing Top-t Most Influential Spatial Sites , 2005, VLDB.

[31]  Shashi Shekhar,et al.  A join-less approach for co-location pattern mining: a summary of results , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).