Continuous monitoring of $\ell_p$ norms in data streams

In insertion-only streaming, one sees a sequence of indices $a_1, a_2, \ldots, a_m\in [n]$. The stream defines a sequence of $m$ frequency vectors $x^{(1)},\ldots,x^{(m)}\in\mathbb{R}^n$ with $(x^{(t)})_i = |\{j : j\in[t], a_j = i\}|$. That is, $x^{(t)}$ is the frequency vector after seeing the first $t$ items in the stream. Much work in the streaming literature focuses on estimating some function $f(x^{(m)})$. Many applications though require obtaining estimates at time $t$ of $f(x^{(t)})$, for every $t\in[m]$. Naively this guarantee is obtained by devising an algorithm with failure probability $\ll 1/m$, then performing a union bound over all stream updates to guarantee that all $m$ estimates are simultaneously accurate with good probability. When $f(x)$ is some $\ell_p$ norm of $x$, recent works have shown that this union bound is wasteful and better space complexity is possible for the continuous monitoring problem, with the strongest known results being for $p=2$ [HTY14, BCIW16, BCINWW17]. In this work, we improve the state of the art for all $0

[1]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[2]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[3]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[4]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[5]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[6]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[7]  David P. Woodruff,et al.  Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[8]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[9]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[10]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[11]  S. Muthukrishnan,et al.  Estimating Entropy and Entropy Norm on Data Streams , 2006, STACS.

[12]  Sudipto Guha,et al.  Sketching information divergences , 2007, Machine Learning.

[13]  Piotr Indyk,et al.  Declaring independence via the sketching of sketches , 2008, SODA '08.

[14]  André Gronemeier,et al.  Asymptotically Optimal Lower Bounds on the NIH-Multi-Party Information Complexity of the AND-Function and Disjointness , 2009, STACS.

[15]  David P. Woodruff,et al.  The Data Stream Space Complexity of Cascaded Norms , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[16]  Ping Li,et al.  Compressed counting , 2008, SODA.

[17]  T. S. Jayram Hellinger Strikes Back: A Note on the Multi-party Information Complexity of AND , 2009, APPROX-RANDOM.

[18]  Daniel M. Kane,et al.  Bounded Independence Fools Degree-2 Threshold Functions , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[19]  R. Ostrovsky,et al.  Zero-one frequency laws , 2010, STOC '10.

[20]  Rafail Ostrovsky,et al.  Measuring independence of datasets , 2009, STOC '10.

[21]  Graham Cormode,et al.  A near-optimal algorithm for estimating the entropy of a stream , 2010, TALG.

[22]  David P. Woodruff,et al.  On the exact space complexity of sketching and streaming small norms , 2010, SODA '10.

[23]  David P. Woodruff,et al.  An optimal algorithm for the distinct elements problem , 2010, PODS '10.

[24]  David P. Woodruff,et al.  Fast Manhattan sketches in data streams , 2010, PODS '10.

[25]  Jelani Nelson,et al.  Sketching and streaming high-dimensional vectors , 2011 .

[26]  David P. Woodruff,et al.  Fast moment estimation in data streams in optimal space , 2010, STOC '11.

[27]  T. S. Jayram On the information complexity of cascaded norms with small domains , 2013, 2013 IEEE Information Theory Workshop (ITW).

[28]  David P. Woodruff,et al.  Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error , 2011, TALG.

[29]  Ke Yi,et al.  Tracking the Frequency Moments at All Times , 2014, ArXiv.

[30]  Vladimir Braverman,et al.  Universal Sketches for the Frequency Negative Moments and Other Decreasing Streaming Sums , 2014, APPROX-RANDOM.

[31]  Sumit Ganguly,et al.  Taylor Polynomial Estimator for Estimating Frequency Moments , 2015, ICALP.

[32]  Rafail Ostrovsky,et al.  Zero-One Laws for Sliding Windows and Universal Sketches , 2015, APPROX-RANDOM.

[33]  David P. Woodruff,et al.  Beating CountSketch for heavy hitters in insertion streams , 2015, STOC.

[34]  Robert Krauthgamer,et al.  Streaming symmetric norms via measure concentration , 2015, STOC.

[35]  David P. Woodruff,et al.  BPTree: An ℓ2 Heavy Hitters Algorithm Using Constant Memory , 2016, PODS.