A Multi-Round Communication Lower Bound for Gap Hamming and Some Consequences

The Gap-Hamming-Distance problem arose in the context of proving space lower bounds for a number of key problems in the data stream model. In this problem, Alice and Bob have to decide whether the Hamming distance between their $n$-bit input strings is large (i.e., at least $n/2 + \sqrt n$) or small (i.e., at most $n/2 - \sqrt n$); they do not care if it is neither large nor small. This $\Theta(\sqrt n)$ gap in the problem specification is crucial for capturing the approximation allowed to a data stream algorithm. Thus far, for randomized communication, an $\Omega(n)$ lower bound on this problem was known only in the one-way setting. We prove an $\Omega(n)$ lower bound for randomized protocols that use any constant number of rounds. As a consequence we conclude, for instance, that $\epsilon$-approximately counting the number of distinct elements in a data stream requires $\Omega(1/\epsilon^2)$ space, even with multiple (a constant number of) passes over the input stream. This extends earlier one-pass lower bounds, answering a long-standing open question. We obtain similar results for approximating the frequency moments and for approximating the empirical entropy of a data stream. In the process, we also obtain tight $n - \Theta(\sqrt{n}\log n)$ lower and upper bounds on the one-way deterministic communication complexity of the problem. Finally, we give a simple combinatorial proof of an $\Omega(n)$ lower bound on the one-way randomized communication complexity.

[1]  A. Razborov Communication Complexity , 2011 .

[2]  T. S. Jayram,et al.  Tight lower bounds for selection in randomly ordered streams , 2008, SODA '08.

[3]  Nathan Linial,et al.  Lower bounds in communication complexity based on factorization norms , 2007, STOC '07.

[4]  Graham Cormode,et al.  Algorithms for distributed functional monitoring , 2008, SODA '08.

[5]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[6]  Emanuele Viola,et al.  One-way multiparty communication lower bound for pointer jumping with applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[7]  Joshua Brody The Maximum Communication Complexity of Multi-Party Pointer Jumping , 2009, 2009 24th Annual IEEE Conference on Computational Complexity.

[8]  David P. Woodruff Optimal space lower bounds for all frequency moments , 2004, SODA '04.

[9]  Peter Bro Miltersen,et al.  On data structures and asymmetric communication complexity , 1994, STOC '95.

[10]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[11]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[12]  Chrisil Arackaparambil,et al.  Functional Monitoring without Monotonicity , 2009, ICALP.

[13]  Felix Wu,et al.  The quantum query complexity of approximating the median and related statistics , 1998, STOC '99.

[14]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[15]  David P. Woodruff The average-case complexity of counting distinct elements , 2009, ICDT '09.

[16]  C. Scovel,et al.  Concentration of the hypergeometric distribution , 2005 .

[17]  Ravi Kumar,et al.  The One-Way Communication Complexity of Hamming Distance , 2008, Theory Comput..

[18]  David P. Woodruff,et al.  Tight lower bounds for the distinct elements problem , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[19]  T. Pitassi,et al.  Integrality gaps of 2 - o(1) for Vertex Cover SDPs in the Lovész-Schrijver Hierarchy , 2007, FOCS 2007.

[20]  Amit Chakrabarti,et al.  An optimal randomised cell probe lower bound for approximate nearest neighbour searching , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[21]  A. Razborov Quantum communication complexity of symmetric predicates , 2002, quant-ph/0204025.

[22]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[23]  Graham Cormode,et al.  A near-optimal algorithm for computing the entropy of a stream , 2007, SODA '07.

[24]  Srinivasan Venkatesh,et al.  Lower bounds for predecessor searching in the cell probe model , 2003, J. Comput. Syst. Sci..

[25]  Amit Chakrabarti,et al.  Lower Bounds for Multi-Player Pointer Jumping , 2007, Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07).

[26]  Erik D. Demaine,et al.  Lower bounds for asymmetric communication channels and distributed source coding , 2006, SODA '06.

[27]  David P. Woodruff Efficient and private distance approximation in the communication and streaming models , 2007 .

[28]  Avi Wigderson,et al.  Monotone circuits for connectivity require super-logarithmic depth , 1990, STOC '88.

[29]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[30]  Alexander A. Sherstov The pattern matrix method for lower bounds on quantum communication , 2008, STOC '08.

[31]  Athanasios K. Tsakalidis,et al.  Data Structures , 2011 .