论文信息 - Hashing-Based-Estimators for Kernel Density in High Dimensions

Hashing-Based-Estimators for Kernel Density in High Dimensions

Given a set of points P⊄ R^d and a kernel k, the Kernel Density Estimate at a point x∊R^d is defined as \mathrm{KDE}_{P}(x)=\frac{1}{|P|}\sum_{y\in P} k(x,y). We study the problem of designing a data structure that given a data set P and a kernel function, returns approximations to the kernel density} of a query point in sublinear time}. We introduce a class of unbiased estimators for kernel density implemented through locality-sensitive hashing, and give general theorems bounding the variance of such estimators. These estimators give rise to efficient data structures for estimating the kernel density in high dimensions for a variety of commonly used kernels. Our work is the first to provide data-structures with theoretical guarantees that improve upon simple random sampling in high dimensions.

Moses Charikar | Paris Siminelakis | M. Charikar | Paris Siminelakis

[1] A. Rinaldo,et al. Generalized density clustering , 2009, 0907.3454.

[2] Eli Upfal,et al. Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[3] Larry A. Wasserman,et al. Nonparametric Ridge Estimation , 2012, ArXiv.

[4] Ilias Diakonikolas,et al. Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[5] Noga Alon,et al. The space complexity of approximating the frequency moments , 1996, STOC '96.

[6] Rocco A. Servedio,et al. Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[7] Anshumali Shrivastava,et al. A New Unbiased and Efficient Class of LSH-Based Samplers and Estimators for Partition Function Computation in Log-Linear Models , 2017, ArXiv.

[8] Leslie Greengard,et al. The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[9] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[10] Alexandr Andoni,et al. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[11] Jeff M. Phillips,et al. Є-Samples for Kernels , 2013, SODA.

[12] William B. March,et al. ASKIT: Approximate Skeletonization Kernel-Independent Treecode in High Dimensions , 2014, SIAM J. Sci. Comput..

[13] Andrew W. Moore,et al. Dual-Tree Fast Gauss Transforms , 2005, NIPS.

[14] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[15] Harish Karnick,et al. Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[16] Jianqing Fan. Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66 , 1996 .

[17] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[18] Sivaraman Balakrishnan,et al. Statistical Inference for Cluster Trees , 2016, NIPS.

[19] Santosh S. Vempala,et al. A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[20] Yen-Chi Chen,et al. Density Level Sets: Asymptotics, Inference, and Visualization , 2015, 1504.05438.

[21] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[22] David Mason,et al. On the Estimation of the Gradient Lines of a Density and the Consistency of the Mean-Shift Algorithm , 2016, J. Mach. Learn. Res..

[23] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[24] K. Böröczky,et al. Covering the Sphere by Equal Spherical Balls , 2003 .

[25] Cameron Musco,et al. Provably Useful Kernel Matrix Approximation in Linear Time , 2016, ArXiv.

[26] Ryan Williams,et al. Probabilistic Polynomials and Hamming Nearest Neighbors , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[27] Cameron Musco,et al. Recursive Sampling for the Nystrom Method , 2016, NIPS.

[28] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[29] Jack J. Dongarra,et al. Guest Editors Introduction to the top 10 algorithms , 2000, Comput. Sci. Eng..

[30] Ronitt Rubinfeld,et al. On the learnability of discrete distributions , 1994, STOC '94.

[31] Suresh Venkatasubramanian,et al. Comparing distributions and shapes using the kernel distance , 2010, SoCG '11.

[32] David P. Woodruff,et al. Faster Kernel Ridge Regression Using Sketching and Preconditioning , 2016, SIAM J. Matrix Anal. Appl..

[33] B. Harshbarger. An Introduction to Probability Theory and its Applications, Volume I , 1958 .

[34] Yan Zheng,et al. Coresets for Kernel Regression , 2017, KDD.

[35] Gregory Valiant,et al. Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[36] Piotr Indyk,et al. On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks , 2017, NIPS.

[37] Alexandr Andoni,et al. Optimal Data-Dependent Hashing for Approximate Near Neighbors , 2015, STOC.

[38] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .

[39] Rasmus Pagh,et al. Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.

[40] Rina Panigrahy,et al. Lower Bounds on Near Neighbor Search via Metric Expansion , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[41] A. Goldenshluger,et al. Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality , 2010, 1009.1016.

[42] S. Bochner. Monotone Funktionen, Stieltjessche Integrale und harmonische Analyse , 1933 .

[43] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[44] Andrew W. Moore,et al. Nonparametric Density Estimation: Toward Computational Tractability , 2003, SDM.

[45] Hans-Peter Kriegel,et al. Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.