Dimensional Testing for Reverse k-Nearest Neighbor Search

Given a query object q, reverse k-nearest neighbor (RkNN) search aims to locate those objects of the database that have q among their k-nearest neighbors. In this paper, we propose an approximation method for solving RkNN queries, where the pruning operations and termination tests are guided by a characterization of the intrinsic dimensionality of the data. The method can accommodate any index structure supporting incremental (forward) nearest-neighbor search for the generation and verification of candidates, while avoiding impractically-high preprocessing costs. We also provide experimental evidence that our method significantly outperforms its competitors in terms of the tradeoff between execution time and the quality of the approximation. Our approach thus addresses many of the scalability issues surrounding the use of previous methods in data mining.

[1]  Elke Achtert,et al.  Online hierarchical clustering in a data warehouse environment , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Elke Achtert,et al.  Efficient reverse k-nearest neighbor estimation , 2007, Informatik - Forschung und Entwicklung.

[3]  Philip S. Yu,et al.  Maximizing bichromatic reverse nearest neighbor for Lp-norm in two- and three-dimensional spaces , 2011, The VLDB Journal.

[4]  Wei Wu,et al.  FINCH: evaluating reverse k-Nearest-Neighbor queries on location data , 2008, Proc. VLDB Endow..

[5]  Yang Du,et al.  On Computing Top-t Most Influential Spatial Sites , 2005, VLDB.

[6]  Michael E. Houle,et al.  Dimensionality, Discriminability, Density and Distance Distributions , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[7]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[8]  Hans-Peter Kriegel,et al.  Incremental Reverse Nearest Neighbor Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[9]  Muhammad Aamir Cheema,et al.  Influence zone: Efficiently processing reverse k nearest neighbors queries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[10]  James Theiler,et al.  Lacunarity in a best estimator of fractal dimension , 1988 .

[11]  Arthur Zimek,et al.  A Framework for Clustering Uncertain Data , 2015, Proc. VLDB Endow..

[12]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[13]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[14]  Matthias Hein,et al.  Intrinsic dimensionality estimation of submanifolds in Rd , 2005, ICML.

[15]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[16]  Jan Vahrenhold,et al.  Reverse Nearest Neighbor Queries , 2002, Encyclopedia of GIS.

[17]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.

[18]  Michael E. Houle,et al.  Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces , 2015, IEEE Transactions on Knowledge and Data Engineering.

[19]  Matthias Hein Intrinsic Dimensionality Estimation of Submanifolds in R , 2005 .

[20]  Elke Achtert,et al.  Approximate reverse k-nearest neighbor queries in general metric spaces , 2006, CIKM '06.

[21]  Michael E. Houle,et al.  Dimensional Testing for Multi-step Similarity Search , 2012, 2012 IEEE 12th International Conference on Data Mining.

[22]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[23]  Kenneth L. Clarkson,et al.  Nearest Neighbor Queries in Metric Spaces , 1997, STOC '97.

[24]  Sanjay Chawla,et al.  Finding Local Anomalies in Very High Dimensional Space , 2010, 2010 IEEE International Conference on Data Mining.

[25]  Ken-ichi Kawarabayashi,et al.  Estimating Local Intrinsic Dimensionality , 2015, KDD.

[26]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[27]  Michael E. Houle,et al.  Efficient similarity search within user-specified projective subspaces , 2016, Inf. Syst..

[28]  P. Grassberger,et al.  Measuring the Strangeness of Strange Attractors , 1983 .

[29]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[30]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[31]  Pertti Mattila,et al.  Hausdorff dimension, orthogonal projections and intersections with planes , 1975 .

[32]  Hans-Peter Kriegel,et al.  Density-based Projected Clustering over High Dimensional Data Streams , 2012, SDM.

[33]  References , 1971 .

[34]  Yufei Tao,et al.  Reverse Nearest Neighbor Search in Metric Spaces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[35]  Amit Singh,et al.  High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[36]  F. Takens Detecting strange attractors in turbulence , 1981 .

[37]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[38]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[39]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[40]  Elke Achtert,et al.  Efficient reverse k-nearest neighbor search in arbitrary metric spaces , 2006, SIGMOD Conference.

[41]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[42]  Jun Sakuma,et al.  Fast approximate similarity search in extremely high-dimensional data sets , 2005, 21st International Conference on Data Engineering (ICDE'05).

[43]  Pasi Fränti,et al.  Outlier Detection Using k-Nearest Neighbour Graph , 2004, ICPR.

[44]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[45]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[46]  N. Boujemaa IKONA: INTERACTIVE SPECIFIC AND GENERIC IMAGE RETRIEVAL , 2003 .

[47]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[48]  Hisashi Kashima,et al.  Generalized Expansion Dimension , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[49]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[50]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[51]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[52]  Dunja Mladenic,et al.  The Role of Hubness in Clustering High-Dimensional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.