High-dimensional kNN joins with incremental updates

The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin+, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin+ for processing high-dimensional kNN joins in dynamic workloads.

[1]  Christian Böhm,et al.  The k-Nearest Neighbour Join: Turbo Charging the KDD Process , 2004, Knowledge and Information Systems.

[2]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[3]  Elke Achtert,et al.  Efficient reverse k-nearest neighbor search in arbitrary metric spaces , 2006, SIGMOD Conference.

[4]  Stefan Berchtold,et al.  High-dimensional index structures database support for next decade's applications (tutorial) , 1998, SIGMOD '98.

[5]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[6]  Beng Chin Ooi,et al.  Multiple aggregations over data streams , 2005, SIGMOD '05.

[7]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[8]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[9]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[10]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[12]  Beng Chin Ooi,et al.  Gorder: An Efficient Method for KNN Join Processing , 2004, VLDB.

[13]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[14]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[15]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[16]  Jianwen Su,et al.  Efficient index-based KNN join processing for high-dimensional data , 2007, Inf. Softw. Technol..

[17]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[18]  Christian S. Jensen,et al.  Multiple k Nearest Neighbor Query Processing in Spatial Network Databases , 2006, ADBIS.

[19]  Yufei Tao,et al.  Reverse Nearest Neighbor Search in Metric Spaces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Raymond Chi-Wing Wong,et al.  On Efficient Spatial Matching , 2007, VLDB.

[21]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[22]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[23]  Christian Böhm,et al.  Dynamically Optimizing High-Dimensional Index Structures , 2000, EDBT.

[24]  Alberto O. Mendelzon,et al.  Querying Time Series Data Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..