Optimal Compression of Approximate Inner Products and Dimension Reduction

Let X be a set of n points of norm at most 1 in the Euclidean space R^k, and suppose ≥0. An ≥-distance sketch for X is a data structure that, given any two points of X enables one to recover the square of the (Euclidean) distance between them up to an additive} error of ≥. Let f(n,k,≥) denote the minimum possible number of bits of such a sketch. Here we determine f(n,k,≥) up to a constant factor for all n ≥ k ≥ 1 and all ≥ ≥ \frac{1}{n^{0.49}}. Our proof is algorithmic, and provides an efficient algorithm for computing a sketch of size O(f(n,k,≥)/n) for each point, so that the square of the distance between any two points can be computed from their sketches up to an additive error of ≥ in time linear in the length of the sketches. We also discuss the case of smaller ≥2/√ n and obtain some new results about dimension reduction in this range. In particular, we show that for any such ≥ and any k ≤ t=\frac{\log (2+≥^2 n)}{≥^2} there are configurations of n points in R^k that cannot be embedded in R^{ℓ} for ℓ

[1]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[2]  Kasper Green Larsen,et al.  Optimality of the Johnson-Lindenstrauss Lemma , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[3]  H. König,et al.  Asymptotic Geometric Analysis , 2015 .

[4]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[5]  A. Giannopoulos On some vector balancing problems , 1997 .

[6]  Noga Alon,et al.  Perturbed Identity Matrices Have High Rank: Proof and Applications , 2009, Combinatorics, Probability and Computing.

[7]  Bo'az Klartag A geometric inequality and a low M-estimate , 2004 .

[8]  Gilles Hargé,et al.  A particular case of correlation inequality for the Gaussian measure , 1999 .

[9]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[10]  Andrej Yu. Garnaev,et al.  On widths of the Euclidean Ball , 1984 .

[11]  Noga Alon,et al.  The approximate rank of a matrix and its algorithmic applications: approximate rank , 2013, STOC '13.

[12]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[13]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[14]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[15]  T. Royen,et al.  A simple proof of the Gaussian correlation conjecture extended to multivariate gamma distributions , 2014, 1408.1028.

[16]  C. G. Khatri,et al.  On Certain Inequalities for Normal Distributions and their Applications to Simultaneous Confidence Bounds , 1967 .

[17]  Noga Alon,et al.  Optimal compression of approximate Euclidean distances , 2016 .

[18]  Piotr Indyk,et al.  Near-Optimal (Euclidean) Metric Compression , 2017, SODA.

[19]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[20]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[21]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .

[22]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .