An Efficient Sum Query Algorithm for Distance-Based Locally Dominating Functions

In this paper, we consider the following sum query problem: Given a point set P in $${\mathbb {R}}^d$$ R d , and a distance-based function f ( p ,  q ) ( i.e., a function of the distance between p and q ) satisfying some general properties, the goal is to develop a data structure and a query algorithm for efficiently computing a $$(1+\epsilon )$$ ( 1 + ϵ ) -approximate solution to the sum $$\sum _{p \in P} f(p,q)$$ ∑ p ∈ P f ( p , q ) for any query point $$q \in {\mathbb {R}}^d$$ q ∈ R d and any small constant $$\epsilon >0$$ ϵ > 0 . Existing techniques for this problem are mainly based on some core-set techniques which often have difficulties to deal with functions with local domination property. Based on several new insights to this problem, we develop in this paper a novel technique to overcome these encountered difficulties. Our algorithm is capable of answering queries with high success probability in time no more than $${\tilde{O}}_{\epsilon ,d}(n^{0.5 + c})$$ O ~ ϵ , d ( n 0.5 + c ) , and the underlying data structure can be constructed in $${\tilde{O}}_{\epsilon ,d}(n^{1+c})$$ O ~ ϵ , d ( n 1 + c ) time for any $$c>0$$ c > 0 , where the hidden constant has only polynomial dependence on $$1/\epsilon$$ 1 / ϵ and d . Our technique is simple and can be easily implemented for practical purpose.

[1]  Yufei Tao,et al.  Dynamic top-k range reporting in external memory , 2012, PODS '12.

[2]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[3]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[4]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[5]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[6]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[7]  Yufei Tao,et al.  Efficient Top-k Indexing via General Reductions , 2016, PODS.

[8]  Ke Chen,et al.  On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications , 2009, SIAM J. Comput..

[9]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[10]  Alexandr Andoni,et al.  Beyond Locality-Sensitive Hashing , 2013, SODA.

[11]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[12]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[13]  Sariel Har-Peled Computing the k Nearest-Neighbors for all Vertices via Dijkstra , 2016, ArXiv.

[14]  Clayton Scott,et al.  Sparse Approximation of a Kernel Mean , 2015, IEEE Transactions on Signal Processing.

[15]  Timothy M. Chan,et al.  Optimal halfspace range reporting in three dimensions , 2009, SODA.

[16]  Moses Charikar,et al.  Hashing-Based-Estimators for Kernel Density in High Dimensions , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[17]  Jinhui Xu,et al.  On Clustering Induced Voronoi Diagrams , 2017, SIAM J. Comput..

[18]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[19]  Boris Aronov,et al.  On approximating the depth and related problems , 2005, SODA '05.

[20]  Jeff M. Phillips,et al.  Improved Coresets for Kernel Density Estimates , 2017, SODA.