On the exact space complexity of sketching and streaming small norms

We settle the 1-pass space complexity of (1 ± ε)-approximating the <i>L</i><sub><i>p</i></sub> norm, for real <i>p</i> with 1 ≤ <i>p</i> ≤ 2, of a length-<i>n</i> vector updated in a length-<i>m</i> stream with updates to its coordinates. We assume the updates are integers in the range [--<i>M, M</i>]. In particular, we show the space required is Θ(ε<sup>-2</sup> log(<i>mM</i>) + log log(<i>ns</i>)) bits. Our result also holds for 0 < <i>p</i> < 1; although <i>L</i><sub><i>p</i></sub> is not a norm in this case, it remains a well-defined function. Our upper bound improves upon previous algorithms of [Indyk, JACM '06] and [Li, SODA '08]. This improvement comes from showing an improved derandomization of the <i>L</i><sub><i>p</i></sub> sketch of Indyk by using <i>k</i>-wise independence for small <i>k</i>, as opposed to using the heavy hammer of a generic pseudorandom generator against space-bounded computation such as Nisan's PRG. Our lower bound improves upon previous work of [Alon-Matias-Szegedy, JCSS '99] and [Woodruff, SODA '04], and is based on showing a direct sum property for the 1-way communication of the gap-Hamming problem.

[1]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[2]  Peter Bro Miltersen,et al.  On Data Structures and Asymmetric Communication Complexity , 1998, J. Comput. Syst. Sci..

[3]  U. Haagerup The best constants in the Khintchine inequality , 1981 .

[4]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[5]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[6]  C. Mallows,et al.  A Method for Simulating Stable Random Variables , 1976 .

[7]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[8]  V. Zolotarev One-dimensional stable distributions , 1986 .

[9]  ViswanathanMahesh,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2003 .

[10]  David P. Woodruff,et al.  1-pass relative-error Lp-sampling with applications , 2010, SODA '10.

[11]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[12]  Ping Li,et al.  Compressed counting , 2008, SODA.

[13]  Rocco A. Servedio,et al.  Bounded Independence Fools Halfspaces , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[14]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[15]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[16]  Piotr Indyk,et al.  Algorithms for dynamic geometric problems over data streams , 2004, STOC '04.

[17]  Graham Cormode,et al.  A near-optimal algorithm for computing the entropy of a stream , 2007, SODA '07.

[18]  K. Friedrichs The identity of weak and strong extensions of differential operators , 1944 .

[19]  David P. Woodruff Efficient and private distance approximation in the communication and streaming models , 2007 .

[20]  J. L. Nolan Stable Distributions. Models for Heavy Tailed Data , 2001 .

[21]  Krzysztof Onak,et al.  Sketching and Streaming Entropy via Approximation Theory , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[22]  Ravi Kumar,et al.  The One-Way Communication Complexity of Hamming Distance , 2008, Theory Comput..

[23]  David P. Woodru Ecient and Private Distance Approximation in the Communication and Streaming Models , 2007 .

[24]  Graham Cormode,et al.  On Estimating Frequency Moments of Data Streams , 2007, APPROX-RANDOM.

[25]  Jessica H. Fong,et al.  An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..

[26]  Joshua Brody,et al.  A Multi-Round Communication Lower Bound for Gap Hamming and Some Consequences , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[27]  Ping Li,et al.  Estimators and tail bounds for dimension reduction in lα (0 < α ≤ 2) using stable random projections , 2008, SODA '08.

[28]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[29]  A. Razborov Communication Complexity , 2011 .

[30]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[31]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[32]  David P. Woodruff,et al.  The Data Stream Space Complexity of Cascaded Norms , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[33]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[34]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[35]  Peter Bro Miltersen,et al.  On data structures and asymmetric communication complexity , 1994, STOC '95.

[36]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[37]  David P. Woodruff,et al.  Coresets and sketches for high dimensional subspace approximation problems , 2010, SODA '10.

[38]  Noam Nisan,et al.  Pseudorandom generators for space-bounded computation , 1992, Comb..

[39]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[40]  Makoto Yamazato,et al.  Unimodality of Infinitely Divisible Distribution Functions of Class $L$ , 1978 .

[41]  Robert Krauthgamer,et al.  The Sketching Complexity of Pattern Matching , 2004, APPROX-RANDOM.

[42]  David P. Woodruff,et al.  Lower bounds for sparse recovery , 2010, SODA '10.

[43]  Piotr Indyk,et al.  Fast mining of massive tabular data via approximate distance computations , 2002, Proceedings 18th International Conference on Data Engineering.