Optimality of Correlated Sampling Strategies

In the "correlated sampling" problem, two players are given probability distributions $P$ and $Q$, respectively, over the same finite set, with access to shared randomness. Without any communication, the two players are each required to output an element sampled according to their respective distributions, while trying to minimize the probability that their outputs disagree. A well known strategy due to Kleinberg-Tardos and Holenstein, with a close variant (for a similar problem) due to Broder, solves this task with disagreement probability at most $2 \delta/(1+\delta)$, where $\delta$ is the total variation distance between $P$ and $Q$. This strategy has been used in several different contexts, including sketching algorithms, approximation algorithms based on rounding linear programming relaxations, the study of parallel repetition and cryptography. In this paper, we give a surprisingly simple proof that this strategy is essentially optimal. Specifically, for every $\delta \in (0,1)$, we show that any correlated sampling strategy incurs a disagreement probability of essentially $2\delta/(1+\delta)$ on some inputs $P$ and $Q$ with total variation distance at most $\delta$. This partially answers a recent question of Rivest. Our proof is based on studying a new problem that we call "constrained agreement". Here, the two players are given subsets $A \subseteq [n]$ and $B \subseteq [n]$, respectively, and their goal is to output an element $i \in A$ and $j \in B$, respectively, while minimizing the probability that $i \neq j$. We prove tight bounds for this question, which in turn imply tight bounds for correlated sampling. Though we settle basic questions about the two problems, our formulation leads to more fine-grained questions that remain open.

[1]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[2]  Anup Rao,et al.  Parallel repetition in projection games and a concentration bound , 2008, SIAM J. Comput..

[3]  Sreenivas Gollapudi,et al.  A dictionary for approximate string search and longest prefix search , 2006, CIKM '06.

[4]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[5]  Kunal Talwar,et al.  Consistent Weighted Sampling , 2007 .

[6]  H. Thorisson Coupling, stationarity, and regeneration , 2000 .

[7]  Bernhard Haeupler,et al.  Consistent Weighted Sampling Made Fast, Small, and Easy , 2014, ArXiv.

[8]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 2002, JACM.

[9]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[10]  David Steurer,et al.  Rounding Parallel Repetitions of Unique Games , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[11]  Thomas Holenstein Parallel Repetition: Simplification and the No-Signaling Case , 2009, Theory Comput..

[12]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[13]  Thomas Holenstein,et al.  Parallel repetition: simplifications and the no-signaling case , 2007, STOC '07.