论文信息 - High-dimensional kNN joins with incremental updates

High-dimensional kNN joins with incremental updates

The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin+, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin+ for processing high-dimensional kNN joins in dynamic workloads.

[1] Christian Böhm,et al. The k-Nearest Neighbour Join: Turbo Charging the KDD Process , 2004, Knowledge and Information Systems.

[2] Beng Chin Ooi,et al. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[3] Elke Achtert,et al. Efficient reverse k-nearest neighbor search in arbitrary metric spaces , 2006, SIGMOD Conference.

[4] Stefan Berchtold,et al. High-dimensional index structures database support for next decade's applications (tutorial) , 1998, SIGMOD '98.

[5] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[6] Beng Chin Ooi,et al. Multiple aggregations over data streams , 2005, SIGMOD '05.

[7] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[8] Christian Böhm,et al. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[9] Hans-Jörg Schek,et al. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[10] King-Ip Lin,et al. An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[11] Belur V. Dasarathy,et al. Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[12] Beng Chin Ooi,et al. Gorder: An Efficient Method for KNN Join Processing , 2004, VLDB.

[13] Pavel Zezula,et al. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[14] S. Muthukrishnan,et al. Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[15] Yufei Tao,et al. Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[16] Jianwen Su,et al. Efficient index-based KNN join processing for high-dimensional data , 2007, Inf. Softw. Technol..

[17] Christos Faloutsos,et al. The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[18] Christian S. Jensen,et al. Multiple k Nearest Neighbor Query Processing in Spatial Network Databases , 2006, ADBIS.

[19] Yufei Tao,et al. Reverse Nearest Neighbor Search in Metric Spaces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20] Raymond Chi-Wing Wong,et al. On Efficient Spatial Matching , 2007, VLDB.

[21] J. A. Hartigan,et al. A k-means clustering algorithm , 1979 .

[22] Beng Chin Ooi,et al. Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[23] Christian Böhm,et al. Dynamically Optimizing High-Dimensional Index Structures , 2000, EDBT.

[24] Alberto O. Mendelzon,et al. Querying Time Series Data Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..