论文信息 - Efficient Density Evaluation for Smooth Kernels

Efficient Density Evaluation for Smooth Kernels

Given a kernel function k(.,.) and a dataset P⊂ R^d, the kernel density function of P at a point xε R^d is equal to KDF_P(x):= 1/|P| Σ_yεP k(x, y). Kernel density evaluation has numerous applications, in scientific computing, statistics, computer vision, machine learning and other fields. In all of them it is necessary to evaluate KDF_P(x) quickly, often for many inputs x and large point-sets P. In this paper we present a collection of algorithms for efficient KDF evaluation under the assumptions that the kernel k is "smooth", i.e. the value changes at most polynomially with the distance. This assumption is satisfied by several well-studied kernels, including the (generalized) t-student kernel and rational quadratic kernel. For smooth kernels, we give a data structure that, after O(dn log (Φ n)/ε^2) preprocessing, estimates KDF_P(x) up to a factor of 1 ± ε in O(dlog (Φ n)/ε^2) time, where Phi; is the aspect ratio. The log(Φn) term can be further replaced by log n under an additional decay condition on k, which is satisfied by the aforementioned examples. We further extend the results in two ways. First, we use low-distortion embeddings to extend the results to kernels defined for spaces other than ℓ_2. The key feature of this reduction is that the distortion of the embedding affects only the running time of the algorithm, not the accuracy of the estimation. As a result, we obtain (1+ε)-approximate estimation algorithms for kernels over other ℓ_p norms, Earth-Mover Distance, and other metric spaces. Second, for smooth kernels that are decreasing with distance, we present a general reduction from density estimation to approximate near neighbor in the underlying space. This allows us to construct algorithms for general doubling metrics, as well as alternative algorithms for l_p norms and other spaces.

[1] Huy Le Nguyen,et al. Algorithms for high dimensional data , 2014 .

[2] Piotr Indyk,et al. On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks , 2017, NIPS.

[3] S. Rao Kosaraju,et al. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[4] Sariel Har-Peled,et al. Fast construction of nets in low dimensional metrics, and their applications , 2004, SCG.

[5] Ofer Neiman. Low Dimensional Embeddings of Doubling Metrics , 2013, WAOA.

[6] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .

[7] Alexandr Andoni,et al. Approximate near neighbors for general symmetric norms , 2016, STOC.

[8] Lee-Ad Gottlieb,et al. Approximate nearest neighbor search for ℓp-spaces (2 < p < ∞) via embeddings , 2019, Theor. Comput. Sci..

[9] A. Rinaldo,et al. Generalized density clustering , 2009, 0907.3454.

[10] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[11] Alexandr Andoni,et al. Nearest neighbor search : the old, the new, and the impossible , 2009 .

[12] Marc G. Genton,et al. Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[13] Jeff M. Phillips,et al. Near-Optimal Coresets of Kernel Density Estimates , 2018, Discrete & Computational Geometry.

[14] William B. March,et al. ASKIT: Approximate Skeletonization Kernel-Independent Treecode in High Dimensions , 2014, SIAM J. Sci. Comput..

[15] Bernhard Schölkopf,et al. Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[16] Leslie Greengard,et al. Fast Direct Methods for Gaussian Processes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Andrew W. Moore,et al. Dual-Tree Fast Gauss Transforms , 2005, NIPS.

[18] Andrew W. Moore,et al. Nonparametric Density Estimation: Toward Computational Tractability , 2003, SDM.

[19] Hans-Peter Kriegel,et al. Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[20] Jeff M. Phillips,et al. Improved Coresets for Kernel Density Estimates , 2017, SODA.

[21] Moses Charikar,et al. Hashing-Based-Estimators for Kernel Density in High Dimensions , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[22] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[23] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[24] Alexander G. Gray,et al. Ultrafast Monte Carlo for kernel estimators and generalized statistical summations , 2007, NIPS 2007.

[25] Piotr Indyk,et al. On Approximate Nearest Neighbors under linfinity Norm , 2001, J. Comput. Syst. Sci..

[26] Alexander G. Gray,et al. Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method , 2008, NIPS.

[27] Arthur Gretton,et al. Learning Theory for Distribution Regression , 2014, J. Mach. Learn. Res..

[28] Robert F. Sproull,et al. Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[29] Ilan Shimshoni,et al. Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30] Sanjoy Dasgupta,et al. An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[31] Thomas Gärtner,et al. Multi-Instance Kernels , 2002, ICML.

[32] Piotr Indyk,et al. Approximate Nearest Neighbor under edit distance via product metrics , 2004, SODA '04.

[33] Santosh S. Vempala. Randomly-oriented k-d Trees Adapt to Intrinsic Dimension , 2012, FSTTCS.

[34] K. Ball. An Elementary Introduction to Modern Convex Geometry , 1997 .

[35] Sanjoy Dasgupta,et al. Random projection trees and low dimensional manifolds , 2008, STOC.

[36] Andrew W. Moore,et al. 'N-Body' Problems in Statistical Learning , 2000, NIPS.

[37] Barnabás Póczos,et al. Two-stage sampled learning theory on distributions , 2015, AISTATS.

[38] Lei Wang,et al. Fast Evaluation of Multiquadric RBF Sums by a Cartesian Treecode , 2011, SIAM J. Sci. Comput..

[39] William B. March,et al. Linear-time Algorithms for Pairwise Statistical Problems , 2009, NIPS.

[40] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.

[41] Robert Krauthgamer,et al. Embedding the Ulam metric into l1 , 2006, Theory Comput..

[42] Larry S. Davis,et al. Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[43] Aviad Rubinstein,et al. Hardness of approximate nearest neighbor search , 2018, STOC.

[44] Michael Griebel,et al. Fast Approximation of the Discrete Gauss Transform in Higher Dimensions , 2013, J. Sci. Comput..