(MATH) We consider the problem of approximating the distance of two <i>d</i>-dimensional vectors <b>x</b> and <b>y</b> in the data stream model. In this model, the 2<i>d</i> coordinates are presented as a "stream" of data in some arbitrary order, where each data item includes the index and value of some coordinate and a bit that identifies the vector (<b>x</b> or <b>y</b>) to which it belongs. The goal is to minimize the amount of memory needed to approximate the distance. For the case of <i>L<sup>p</sup></i>-distance with <i>p</i> ε [1,2], there are good approximation algorithms that run in polylogarithmic space in <i>d</i> (here we assume that each coordinate is an integer with <i>O</i>(log <i>d</i>) bits). Here we prove that they do not exist for <i>p</i>ρ2. In particular, we prove an optimal approximation-space tradeoff of approximating <i>L</i><sup>&infty;</sup> distance of two vectors. We show that any randomized algorithm that approximates <i>L</i><sup>&infty;</sup> distance of two length <i>d</i> vectors within factor of <i>d</i><sup>δ</sup> requires ω(<i>d</i><sup>1—4δ</sup>) space. As a consequence we show that for <i>p</i>ρ2/(1—4δ), any randomized algorithm that approximate <i>L</i><sup><i>p</i></sup> distance of two length <i>d</i> vectors within a factor <i>d</i><sup>δ</sup> requires ω(<i>d</i> <sup>1— <sup>2</sup>< \over <sub>p</sub>—4δ</sup>) space.The lower bound follows from a lower bound on the two-party one-round communication complexity of this problem. This lower bound is proved using a combination of information theory and Fourier analysis.
[1]
Andrew C. Yao,et al.
Lower bounds by probabilistic arguments
,
1983,
24th Annual Symposium on Foundations of Computer Science (sfcs 1983).
[2]
W. B. Johnson,et al.
Extensions of Lipschitz mappings into Hilbert space
,
1984
.
[3]
Peter Frankl,et al.
Complexity classes in communication complexity theory (preliminary version)
,
1986,
IEEE Annual Symposium on Foundations of Computer Science.
[4]
Bala Kalyanasundaram,et al.
The Probabilistic Communication Complexity of Set Intersection
,
1992,
SIAM J. Discret. Math..
[5]
Thomas M. Cover,et al.
Elements of Information Theory
,
2005
.
[6]
Alexander A. Razborov,et al.
On the Distributional Complexity of Disjointness
,
1992,
Theor. Comput. Sci..
[7]
Noga Alon,et al.
The Space Complexity of Approximating the Frequency Moments
,
1999
.
[8]
Jessica H. Fong,et al.
An Approximate Lp Difference Algorithm for Massive Data Streams
,
1999,
Discret. Math. Theor. Comput. Sci..
[9]
Mahesh Viswanathan,et al.
An Approximate L1-Difference Algorithm for Massive Data Streams
,
2002,
SIAM J. Comput..
[10]
J. Nolan.
Stable Distributions
,
2002
.
[11]
Ziv Bar-Yossef,et al.
Information theory methods in communication complexity
,
2002,
Proceedings 17th IEEE Annual Conference on Computational Complexity.