Optimal lower bounds for universal relation, samplers, and finding duplicates

In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x$ and $y$ in $\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $\Theta(\min\{n, \log(1/\delta)\log^2(\frac{n}{\log(1/\delta)})\})$ bits for failure probability $\delta$. Our lower bound holds even if promised $\mathop{support}(y)\subset \mathop{support}(x)$. As a corollary, we obtain optimal lower bounds for $\ell_p$-sampling in strict turnstile streams for $0\le p < 2$, as well as for the problem of finding duplicates in a stream. Our lower bounds do not need to use large weights, and hold even if it is promised that $x\in\{0,1\}^n$ at all points in the stream. Our lower bound demonstrates that any algorithm $\mathcal{A}$ solving sampling problems in turnstile streams in low memory can be used to encode subsets of $[n]$ of certain sizes into a number of bits below the information theoretic minimum. Our encoder makes adaptive queries to $\mathcal{A}$ throughout its execution, but done carefully so as to not violate correctness. This is accomplished by injecting random noise into the encoder's interactions with $\mathcal{A}$, which is loosely motivated by techniques in differential privacy. Our correctness analysis involves understanding the ability of $\mathcal{A}$ to correctly answer adaptive queries which have positive but bounded mutual information with $\mathcal{A}$'s internal randomness, and may be of independent interest in the newly emerging area of adaptive data analysis with a theoretical computer science lens.

[1]  Alexandr Andoni,et al.  Streaming Algorithms via Precision Sampling , 2010, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[2]  Mohammad Taghi Hajiaghayi,et al.  Parameterized Streaming: Maximal Matching and Vertex Cover , 2015, SODA.

[3]  Piotr Indyk,et al.  Sampling in dynamic data streams and applications , 2005, Int. J. Comput. Geom. Appl..

[4]  Sudipto Guha,et al.  Vertex and Hyperedge Connectivity in Dynamic Graph Streams , 2015, PODS.

[5]  Martin Farach-Colton,et al.  Tight Approximations of Degeneracy in Large Graphs , 2016, LATIN.

[6]  Graham Cormode,et al.  Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling , 2005, VLDB.

[7]  Ravi Kumar,et al.  An improved data stream algorithm for frequency moments , 2004, SODA '04.

[8]  Jaikumar Radhakrishnan,et al.  Finding duplicates in a data stream , 2009, SODA.

[9]  Yang Li,et al.  On Estimating Maximum Matching Size in Graph Streams , 2017, SODA.

[10]  Peter Robinson,et al.  Fast Distributed Algorithms for Connectivity and MST in Large Graphs , 2015, SPAA.

[11]  Or Meir,et al.  Toward better formula lower bounds: an information complexity approach to the KRW composition conjecture , 2014, STOC.

[12]  CormodeGraham,et al.  A unifying framework for ℓ0-sampling algorithms , 2014 .

[13]  Jun Tarui,et al.  Finding a Duplicate and a Missing Item in a Stream , 2007, TAMC.

[14]  Andrew McGregor,et al.  Graph stream algorithms: a survey , 2014, SGMD.

[15]  Sudipto Guha,et al.  Analyzing graph structure via linear measurements , 2012, SODA.

[16]  Bruce M. Kapron,et al.  Dynamic graph connectivity with improved worst case update time and sublinear space , 2015, ArXiv.

[17]  David P. Woodruff,et al.  An optimal algorithm for the distinct elements problem , 2010, PODS '10.

[18]  Zhengyu Wang An Improved Randomized Data Structure for Dynamic Graph Connectivity , 2015, ArXiv.

[19]  Christian Konrad,et al.  Maximum Matching in Turnstile Streams , 2015, ESA.

[20]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[21]  Sriram V. Pemmaraju,et al.  Toward Optimal Bounds in the Congested Clique: Graph Connectivity and MST , 2015, PODC.

[22]  David P. Woodruff,et al.  Brief Announcement: Applications of Uniform Sampling: Densest Subgraph and Beyond , 2015, SPAA.

[23]  David P. Woodruff,et al.  1-pass relative-error Lp-sampling with applications , 2010, SODA '10.

[24]  Or Meir,et al.  Toward the KRW Composition Conjecture: Cubic Formula Lower Bounds via Communication Complexity , 2016, Electron. Colloquium Comput. Complex..

[25]  Graham Cormode,et al.  A unifying framework for ℓ0-sampling algorithms , 2013, Distributed and Parallel Databases.

[26]  Sofya Vorotnikova,et al.  Densest Subgraph in Dynamic Graph Streams , 2015, MFCS.

[27]  Chris Schwiegelshohn,et al.  Sublinear Estimation of Weighted Matchings in Dynamic Data Streams , 2015, ESA.

[28]  Uri Zwick,et al.  The communication complexity of the universal relation , 1997, Proceedings of Computational Complexity. Twelfth Annual IEEE Conference.

[29]  Sofya Vorotnikova,et al.  Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams , 2016, SODA.

[30]  Yin Tat Lee,et al.  Single Pass Spectral Sparsification in Dynamic Streams , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[31]  Avi Wigderson,et al.  Composition of the Universal Relation , 1990, Advances In Computational Complexity Theory.

[32]  Avi Wigderson,et al.  Monotone circuits for connectivity require super-logarithmic depth , 1990, STOC '88.

[33]  Russell Impagliazzo,et al.  Communication complexity towards lower bounds on circuit depth , 2001, computational complexity.

[34]  Hossein Jowhari,et al.  Tight bounds for Lp samplers, finding duplicates in streams, and related problems , 2010, PODS.

[35]  Ran Raz,et al.  Super-logarithmic depth lower bounds via the direct sum in communication complexity , 1995, computational complexity.

[36]  Bruce M. Kapron,et al.  Dynamic graph connectivity in polylogarithmic worst case time , 2013, SODA.

[37]  Charalampos E. Tsourakakis,et al.  Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams , 2015, STOC.