Down the Rabbit Hole: Robust Proximity Search and Density Estimation in Sublinear Space

For a set of n points in \Re^d, and parameters k and e, we present a data structure that answers (1 + e)-approximate k nearest neighbor queries in logarithmic time. Surprisingly, the space used by the data-structure is \wide tilde{O}(n/k), that is, the space used is sub linear in the input size if k is sufficiently large. Our approach provides a novel way to summarize geometric data, such that meaningful proximity queries on the data can be carried out using this sketch. Using this we provide a sub linear space data-structure that can estimate the density of a point set under various measures, including: (i) sum of distances of k closest points to the query point, and (ii) sum of squared distances of k closest points to the query point. Our approach generalizes to other distance based estimation of densities of similar flavor.

[1]  Pankaj K. Agarwal,et al.  Geometric Range Searching and Its Relatives , 2007 .

[2]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[3]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[4]  S. Rao Kosaraju,et al.  A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[5]  Jirí Matousek,et al.  Efficient partition trees , 1991, SCG '91.

[6]  Michael Segal,et al.  Geographic Quorum System Approximations , 2005, Algorithmica.

[7]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8]  Sunil Arya,et al.  Space-time tradeoffs for approximate spherical range counting , 2005, SODA '05.

[9]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[10]  Sunil Arya,et al.  Space-time tradeoffs for approximate nearest neighbor searching , 2009, JACM.

[11]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[12]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[13]  B. M. Fulk MATH , 1992 .

[14]  Timothy M. Chan Approximate Nearest Neighbor Queries Revisited , 1998, Discret. Comput. Geom..

[15]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Jim Ruppert,et al.  A Delaunay Refinement Algorithm for Quality 2-Dimensional Mesh Generation , 1995, J. Algorithms.

[17]  Quentin Mérigot Lower bounds for k-distance approximation , 2013, SoCG '13.

[18]  Micha Sharir,et al.  Relative (p,ε)-Approximations in Geometry , 2011, Discret. Comput. Geom..

[19]  Sariel Har-Peled,et al.  Coresets for Discrete Integration and Clustering , 2006, FSTTCS.

[20]  Sariel Har-Peled,et al.  Down the Rabbit Hole: Robust Proximity Search in Sublinear Space , 2011, ArXiv.

[21]  Calyampudi R. Rao Handbook of statistics , 1980 .

[22]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[23]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[24]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[25]  Leonidas J. Guibas,et al.  Witnessed k-Distance , 2011, Discrete & Computational Geometry.

[26]  Sariel Har-Peled Geometric Approximation Algorithms , 2011 .

[27]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[28]  David G. Stork,et al.  Pattern Classification , 1973 .

[29]  Sunil Arya,et al.  Linear-size approximate voronoi diagrams , 2002, SODA '02.

[30]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.

[31]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[32]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[33]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[34]  L. Devroye,et al.  8 Nearest neighbor methods in discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[35]  M. J. Katz,et al.  Geographic Quorum Systems Approximations , 2014 .

[36]  Bernard Chazelle Technical perspective: finding a good neighbor, near and fast , 2008, CACM.

[37]  Yi Li,et al.  Improved bounds on the sample complexity of learning , 2000, SODA '00.

[38]  Sariel Har-Peled,et al.  Approximate nearest neighbor search for low dimensional queries , 2011, SODA '11.

[39]  Sariel Har-Peled,et al.  Constructing approximate shortest path maps in three dimensions , 1998, SCG '98.

[40]  Boris Aronov,et al.  On approximating the depth and related problems , 2005, SODA '05.

[41]  Kenneth L. Clarkson,et al.  Applications of random sampling in computational geometry, II , 1988, SCG '88.

[42]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[43]  Timothy M. Chan Optimal Partition Trees , 2010, SCG.

[44]  K. Clarkson Nearest-Neighbor Searching and Metric Space Dimensions , 2005 .

[45]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[46]  Jirí Matousek,et al.  Ray shooting and parametric search , 1992, STOC '92.

[47]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[48]  Sariel Har-Peled A replacement for Voronoi diagrams of near linear size , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[49]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[50]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[51]  Sergei Bespamyatnikh Dynamic Algorithms for Approximate Neighbor Searching , 1996 .

[52]  Mark de Berg,et al.  Star-quadtrees and guard-quadtrees: I/O-efficient indexes for fat triangulations and low-density planar subdivisions , 2010, Comput. Geom..

[53]  W. Marsden I and J , 2012 .

[54]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[55]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .