Streaming symmetric norms via measure concentration

We characterize the streaming space complexity of every symmetric norm l (a norm on ℝn invariant under sign-flips and coordinate-permutations), by relating this space complexity to the measure-concentration characteristics of l. Specifically, we provide nearly matching upper and lower bounds on the space complexity of calculating a (1 ± ε)-approximation to the norm of the stream, for every 0 < ε ≤ 1/2. (The bounds match up to (ε-1 logn) factors.) We further extend those bounds to any large approximation ratio D≥ 1.1, showing that the decrease in space complexity is proportional to D2, and that this factor the best possible. All of the bounds depend on the median of l(x) when x is drawn uniformly from the l2 unit sphere. The same median governs many phenomena in high-dimensional spaces, such as large-deviation bounds and the critical dimension in Dvoretzky's Theorem. The family of symmetric norms contains several well-studied norms, such as all lp norms, and indeed we provide a new explanation for the disparity in space complexity between p ≤ 2 and p > 2. In addition, we apply our general results to easily derive bounds for several norms that were not studied before in the streaming model, including the top-k norm and the k-support norm, which was recently employed for machine learning tasks. Overall, these results make progress on two outstanding problems in the area of sublinear algorithms (Problems 5 and 30 in http://sublinear.info.

[1]  Krzysztof Onak,et al.  Sketching and Streaming Entropy via Approximation Theory , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[2]  Vladimir Braverman,et al.  An Optimal Algorithm for Large Frequency Moments Using O(n^(1-2/k)) Bits , 2014, APPROX-RANDOM.

[3]  F. John Extremum Problems with Inequalities as Subsidiary Conditions , 2014 .

[4]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[5]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[6]  S. Muthukrishnan,et al.  Estimating Entropy and Entropy Norm on Data Streams , 2006, Internet Math..

[7]  André Gronemeier,et al.  Asymptotically Optimal Lower Bounds on the NIH-Multi-Party Information Complexity of the AND-Function and Disjointness , 2009, STACS.

[8]  Vladimir Braverman,et al.  Universal Sketches for the Frequency Negative Moments and Other Decreasing Streaming Sums , 2014, APPROX-RANDOM.

[9]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[10]  Rafail Ostrovsky,et al.  Zero-One Laws for Sliding Windows and Universal Sketches , 2015, APPROX-RANDOM.

[11]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[12]  Sumit Ganguly,et al.  Taylor Polynomial Estimator for Estimating Frequency Moments , 2015, ICALP.

[13]  Alexandr Andoni,et al.  Sketching and Embedding are Equivalent for Norms , 2014, STOC.

[14]  Rafail Ostrovsky,et al.  Approximating Large Frequency Moments with Pick-and-Drop Sampling , 2012, APPROX-RANDOM.

[15]  T. S. Jayram On the information complexity of cascaded norms with small domains , 2013, 2013 IEEE Information Theory Workshop (ITW).

[16]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[17]  Nathan Srebro,et al.  Sparse Prediction with the $k$-Support Norm , 2012, NIPS.

[18]  David P. Woodruff,et al.  A Tight Lower Bound for High Frequency Moment Estimation with Small Error , 2013, APPROX-RANDOM.

[19]  Massimiliano Pontil,et al.  Spectral k-Support Norm Regularization , 2014, NIPS.

[20]  David P. Woodruff,et al.  The Data Stream Space Complexity of Cascaded Norms , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[21]  Ping Li,et al.  Estimators and tail bounds for dimension reduction in lα (0 < α ≤ 2) using stable random projections , 2008, SODA '08.

[22]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[23]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[24]  Graham Cormode,et al.  On Estimating Frequency Moments of Data Streams , 2007, APPROX-RANDOM.

[25]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[26]  Bo'az Klartag,et al.  Small ball probability and Dvoretzky’s Theorem , 2004, math/0410001.

[27]  David P. Woodruff,et al.  On the exact space complexity of sketching and streaming small norms , 2010, SODA '10.

[28]  Rafail Ostrovsky,et al.  Generalizing the Layering Method of Indyk and Woodruff: Recursive Sketches for Frequency-Based Vectors on Streams , 2013, APPROX-RANDOM.

[29]  Alexandr Andoni,et al.  Tight Lower Bound for Linear Sketches of Moments , 2013, ICALP.

[30]  David P. Woodruff,et al.  Turnstile streaming algorithms might as well be linear sketches , 2014, STOC.

[31]  Noam Nisan,et al.  Pseudorandom generators for space-bounded computation , 1992, Comb..

[32]  List of Open Problems in Sublinear Algorithms , .

[33]  Kim-Chuan Toh,et al.  On the Moreau-Yosida Regularization of the Vector k-Norm Related Functions , 2014, SIAM J. Optim..

[34]  Alexandr Andoni,et al.  Streaming Algorithms via Precision Sampling , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[35]  V. Milman,et al.  Asymptotic Theory Of Finite Dimensional Normed Spaces , 1986 .

[36]  R. Ostrovsky,et al.  Zero-one frequency laws , 2010, STOC '10.

[37]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[38]  Graham Cormode,et al.  A near-optimal algorithm for computing the entropy of a stream , 2007, SODA '07.

[39]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[40]  David P. Woodruff,et al.  Streaming Space Complexity of Nearly All Functions of One Variable on Frequency Vectors , 2016, PODS.

[41]  Michael E. Saks,et al.  Space lower bounds for distance approximation in the data stream model , 2002, STOC '02.