Separations and equivalences between turnstile streaming and linear sketching

A longstanding observation, which was partially proven by Li, Nguyen, and Woodruff in 2014, and extended by Ai, Hu, Li, and Woodruff in 2016, is that any turnstile streaming algorithm can be implemented as a linear sketch (the reverse is trivially true). We study the relationship between turnstile streaming and linear sketching algorithms in more detail, giving both new separations and new equivalences between the two models. It was shown by Li, Nguyen, and Woodruff in 2014 that, if a turnstile algorithm works for arbitrarily long streams with arbitrarily large coordinates at intermediate stages of the stream, then the turnstile algorithm is equivalent to a linear sketch. We show separations of the opposite form: if either the stream length or the maximum value of the stream are substantially restricted, there exist problems where linear sketching is exponentially harder than turnstile streaming. A further limitation of the Li, Nguyen, and Woodruff equivalence is that the turnstile sketching algorithm is neither explicit nor uniform, but requires an exponentially long advice string. We show how to remove this limitation for deterministic streaming algorithms: we give an explicit small-space algorithm that takes the streaming algorithm and computes an equivalent module.

[1]  Eric Price,et al.  The Sketching Complexity of Graph and Hypergraph Counting , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Sudipto Guha,et al.  Analyzing graph structure via linear measurements , 2012, SODA.

[3]  Piotr Indyk,et al.  K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance , 2011, STOC '11.

[4]  Yang Li,et al.  On Estimating Maximum Matching Size in Graph Streams , 2017, SODA.

[5]  David P. Woodruff,et al.  Data Streams with Bounded Deletions , 2018, PODS.

[6]  Elchanan Mossel,et al.  Linear Sketching over F_2 , 2018, CCC.

[7]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[8]  Huacheng Yu,et al.  Optimal Lower Bounds for Distributed and Streaming Spanning Forest Computation , 2018, Electron. Colloquium Comput. Complex..

[9]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[10]  Sumit Ganguly,et al.  Lower Bounds on Frequency Estimation of Data Streams (Extended Abstract) , 2008, CSR.

[11]  Noam Nisan,et al.  Pseudorandom generators for space-bounded computation , 1992, Comb..

[12]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[13]  Christian Konrad,et al.  Maximum Matching in Turnstile Streams , 2015, ESA.

[14]  Piotr Indyk,et al.  Comparing Data Streams Using Hamming Norms (How to Zero In) , 2002, VLDB.

[15]  Mohammad Ghodsi,et al.  New Streaming Algorithms for Counting Triangles in Graphs , 2005, COCOON.

[16]  Piotr Indyk,et al.  Sampling in dynamic data streams and applications , 2005, Int. J. Comput. Geom. Appl..

[17]  Shachar Lovett,et al.  Optimality of linear sketching under modular updates , 2018, Electron. Colloquium Comput. Complex..

[18]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[19]  David P. Woodruff,et al.  New Characterizations in Turnstile Streams with Applications , 2016, CCC.

[20]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[21]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[22]  Yin Tat Lee,et al.  Single Pass Spectral Sparsification in Dynamic Streams , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[23]  Bruce M. Kapron,et al.  Dynamic graph connectivity in polylogarithmic worst case time , 2013, SODA.

[24]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[25]  David P. Woodruff,et al.  Applications of the Shannon-Hartley theorem to data streams and sparse recovery , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[26]  David P. Woodruff,et al.  Turnstile streaming algorithms might as well be linear sketches , 2014, STOC.

[27]  Yang Li,et al.  Maximum Matchings in Dynamic Graph Streams and the Simultaneous Communication Model , 2016, SODA.

[28]  Christian Sohler,et al.  Coresets in dynamic geometric data streams , 2005, STOC '05.

[29]  Eric Price,et al.  A Hybrid Sampling Scheme for Triangle Counting , 2016, SODA.