New Characterizations in Turnstile Streams with Applications

Recently, [Li, Nguyen, Woodruff, STOC'2014] showed any 1-pass constant probability streaming algorithm for computing a relation f on a vector x ∈ {−m, − (m − 1), ..., m}n presented in the turnstile data stream model can be implemented by maintaining a linear sketch A · × mod q, where A is an r × n integer matrix and q = (q1, ..., qr) is a vector of positive integers. The space complexity of maintaining A · × mod q, not including the random bits used for sampling A and q, matches the space of the optimal algorithm. We give multiple strengthenings of this reduction, together with new applications. In particular, we show how to remove the following shortcomings of their reduction: 1. The Box Constraint. Their reduction applies only to algorithms that must be correct even if ∥;x∥;∞ = maxi∈[n] |xi| is allowed to be much larger than m at intermediate points in the stream, provided that x ∈ {−m, −(m − 1), ..., m}n at the end of the stream. We give a condition under which the optimal algorithm is a linear sketch even if it works only when promised that x ∈ {−m, −(m − 1), ..., m}n at all points in the stream. Using this, we show the first super-constant Ω(log m) bits lower bound for the problem of maintaining a counter up to an additive em error in a turnstile stream, where e is any constant in (0, ½). Previous lower bounds are based on communication complexity and are only for relative error approximation; interestingly, we do not know how to prove our result using communication complexity. More generally, we show the first super-constant Ω(log m) lower bound for additive approximation of ep-norms; this bound is tight for 1 ≤ p ≤ 2. 2. Negative Coordinates. Their reduction allows xi to be negative while processing the stream. We show an equivalence between 1-pass algorithms and linear sketches A · x mod q in dynamic graph streams, or more generally, the strict turnstile model, in which for all i ∈ [n], xi ≥ 0 at all points in the stream. Combined with [Assadi, Khanna, Li, Yaroslavtsev, SODA'2016], this resolves the 1-pass space complexity of approximating the maximum matching in a dynamic graph stream, answering a question in that work. 3. 1-Pass Restriction. Their reduction only applies to 1-pass data stream algorithms in the turnstile model, while there exist algorithms for heavy hitters and for low rank approximation which provably do better with multiple passes. We extend the reduction to algorithms which make any number of passes, showing the optimal algorithm is to choose a new linear sketch at the beginning of each pass, based on the output of previous passes.

[1]  Ilan Newman,et al.  Private vs. Common Random Bits in Communication Complexity , 1991, Inf. Process. Lett..

[2]  David P. Woodruff Low Rank Approximation Lower Bounds in Row-Update Streams , 2014, NIPS.

[3]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[4]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[5]  David P. Woodruff,et al.  The Simultaneous Communication of Disjointness with Applications to Data Streams , 2015, ICALP.

[6]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[7]  David P. Woodruff,et al.  On the Power of Adaptivity in Sparse Recovery , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[8]  J. Gathen,et al.  A bound on solutions of linear integer equalities and inequalities , 1978 .

[9]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[10]  David P. Woodruff,et al.  On the exact space complexity of sketching and streaming small norms , 2010, SODA '10.

[11]  David P. Woodruff,et al.  Turnstile streaming algorithms might as well be linear sketches , 2014, STOC.

[12]  Christos Boutsidis,et al.  Optimal principal component analysis in distributed and streaming models , 2015, STOC.

[13]  Yang Li,et al.  Maximum Matchings in Dynamic Graph Streams and the Simultaneous Communication Model , 2016, SODA.

[14]  Sumit Ganguly,et al.  Lower Bounds on Frequency Estimation of Data Streams (Extended Abstract) , 2008, CSR.

[15]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.