Space lower bounds for distance approximation in the data stream model

(MATH) We consider the problem of approximating the distance of two <i>d</i>-dimensional vectors <b>x</b> and <b>y</b> in the data stream model. In this model, the 2<i>d</i> coordinates are presented as a "stream" of data in some arbitrary order, where each data item includes the index and value of some coordinate and a bit that identifies the vector (<b>x</b> or <b>y</b>) to which it belongs. The goal is to minimize the amount of memory needed to approximate the distance. For the case of <i>L<sup>p</sup></i>-distance with <i>p</i> ε [1,2], there are good approximation algorithms that run in polylogarithmic space in <i>d</i> (here we assume that each coordinate is an integer with <i>O</i>(log <i>d</i>) bits). Here we prove that they do not exist for <i>p</i>ρ2. In particular, we prove an optimal approximation-space tradeoff of approximating <i>L</i><sup>&infty;</sup> distance of two vectors. We show that any randomized algorithm that approximates <i>L</i><sup>&infty;</sup> distance of two length <i>d</i> vectors within factor of <i>d</i><sup>δ</sup> requires ω(<i>d</i><sup>1—4δ</sup>) space. As a consequence we show that for <i>p</i>ρ2/(1—4δ), any randomized algorithm that approximate <i>L</i><sup><i>p</i></sup> distance of two length <i>d</i> vectors within a factor <i>d</i><sup>δ</sup> requires ω(<i>d</i> <sup>1— <sup>2</sup>< \over <sub>p</sub>—4δ</sup>) space.The lower bound follows from a lower bound on the two-party one-round communication complexity of this problem. This lower bound is proved using a combination of information theory and Fourier analysis.

[1]  Andrew C. Yao,et al.  Lower bounds by probabilistic arguments , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[2]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[3]  Peter Frankl,et al.  Complexity classes in communication complexity theory (preliminary version) , 1986, IEEE Annual Symposium on Foundations of Computer Science.

[4]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[7]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[8]  Jessica H. Fong,et al.  An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..

[9]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[10]  J. Nolan Stable Distributions , 2002 .

[11]  Ziv Bar-Yossef,et al.  Information theory methods in communication complexity , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.