On the information complexity of cascaded norms with small domains

We consider the problem of estimating cascaded norms in a data stream, a well-studied generalization of the classical norm estimation problem, where the data is aggregated in a cascaded fashion along multiple attributes. We show that when the number of attributes for each item is at most d, then estimating the cascaded norm Lk·L1 requires space Ω(d·n1-2/k) for every d = O(n1/k). This result interpolates between the tight lower bounds known previously for the two extremes of d = 1 and d = Θ(n1/k) [1]. The proof of this result uses the information complexity paradigm that has proved successful in obtaining tight lower bounds for several well-known problems. We use the above data stream problem as a motivation to sketch some of the key ideas of this paradigm. In particular, we give a unified and a more general view of the key negative-type inequalities satisfied by the transcript distributions of communication protocols.

[1]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[2]  Ravi Kumar,et al.  The One-Way Communication Complexity of Hamming Distance , 2008, Theory Comput..

[3]  David P. Woodruff,et al.  The Data Stream Space Complexity of Cascaded Norms , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[4]  T. S. Jayram Information complexity: a tutorial , 2010, PODS '10.

[5]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[6]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[7]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[8]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[9]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[10]  Farid M. Ablayev,et al.  Lower Bounds for One-Way Probabilistic Communication Complexity and Their Application to Space Complexity , 1996, Theor. Comput. Sci..

[11]  N. J. A. Sloane,et al.  The On-Line Encyclopedia of Integer Sequences , 2003, Electron. J. Comb..

[12]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[13]  A. Razborov Communication Complexity , 2011 .

[14]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[15]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[16]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[17]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[18]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[19]  Andrew Chi-Chih Yao,et al.  Informational complexity and the direct sum problem for simultaneous message complexity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[20]  Ran Raz,et al.  Super-logarithmic depth lower bounds via direct sum in communication complexity , 1991, [1991] Proceedings of the Sixth Annual Structure in Complexity Theory Conference.

[21]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[22]  Noam Nisan,et al.  On Randomized One-round Communication Complexity , 1995, STOC '95.

[23]  Sumit Ganguly,et al.  Estimating hybrid frequency moments of data streams , 2008, J. Comb. Optim..

[24]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.