ERkNN: efficient reverse k-nearest neighbors retrieval with local kNN-distance estimation

The Reverse k-Nearest Neighbors (RkNN) queries are important in profile-based marketing, information retrieval, decision support and data mining systems. However, they are very expensive and existing algorithms are not scalable to queries in high dimensional spaces or of large values of k. This paper describes an efficient estimation-based RkNN search algorithm (ERkNN) which answers RkNN queries based on local kNN-distance estimation methods. The proposed approach utilizes estimation-based filtering strategy to lower the computation cost of RkNN queries. The results of extensive experiments on both synthetic and real life datasets demonstrate that ERkNN algorithm retrieves RkNN efficiently and is scalable with respect to data dimensionality, k, and data size.

[1]  Divyakant Agrawal,et al.  Discovery of Influence Sets in Frequently Updated Databases , 2001, VLDB.

[2]  C. R. Henson Conclusion , 1969 .

[3]  A. K. Jain,et al.  A critical evaluation of intrinsic dimensionality algorithms. , 1980 .

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Bruce L. Worthington,et al.  Windows 2000 Disk IO Performance , 2000 .

[6]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[7]  King-Ip Lin,et al.  Applying bulk insertion techniques for dynamic reverse nearest neighbor problems , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[8]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[9]  Dimitrios Gunopulos,et al.  Nearest Neighbor Queries in a Mobile Environment , 1999, Spatio-Temporal Database Management.

[10]  Amit Singh,et al.  High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[11]  Divesh Srivastava,et al.  Reverse Nearest Neighbor Aggregates Over Data Streams , 2002, VLDB.

[12]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[13]  K. Fischer The Smallest Enclosing Ball of Balls , 2001 .

[14]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  Christian Böhm,et al.  A cost model for query processing in high dimensional data spaces , 2000, TODS.

[16]  Christian Bohm,et al.  A cost model for query processing in high dimensional data spaces , 2000 .

[17]  Shin'ichi Satoh,et al.  Distinctiveness-sensitive nearest-neighbor search for efficient similarity retrieval of multimedia information , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[19]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .