Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters

We consider the problems of distribution estimation and heavy hitter (frequency) estimation under privacy and communication constraints. While these constraints have been studied separately, optimal schemes for one are sub-optimal for the other. We propose a sample-optimal $\varepsilon$-locally differentially private (LDP) scheme for distribution estimation, where each user communicates only one bit, and requires no public randomness. We show that Hadamard Response, a recently proposed scheme for $\varepsilon$-LDP distribution estimation is also utility-optimal for heavy hitter estimation. Finally, we show that unlike distribution estimation, without public randomness where only one bit suffices, any heavy hitter estimation algorithm that communicates $o(\min \{\log n, \log k\})$ bits from each user cannot be optimal.

[1]  Martin J. Wainwright,et al.  Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation , 2013, NIPS.

[2]  Sanjeev Khanna,et al.  Distributed Private Heavy Hitters , 2012, ICALP.

[3]  Himanshu Tyagi,et al.  Distributed Simulation and Distributed Inference , 2018, Electron. Colloquium Comput. Complex..

[4]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[5]  Alexander Barg,et al.  Optimal Schemes for Discrete Distribution Estimation Under Locally Differential Privacy , 2017, IEEE Transactions on Information Theory.

[6]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[7]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[8]  Raef Bassily,et al.  Linear Queries Estimation with Local Differential Privacy , 2018, AISTATS.

[9]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[10]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[11]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[12]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[13]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[14]  Himanshu Tyagi,et al.  Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction , 2018, IEEE Transactions on Information Theory.

[15]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[16]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[17]  Eran Omri,et al.  Distributed Private Data Analysis: On Simultaneously Solving How and What , 2008, CRYPTO.

[18]  Uri Stemmer,et al.  Heavy Hitters and the Structure of Local Privacy , 2017, PODS.

[19]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[20]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[21]  Ohad Shamir,et al.  Detecting Correlations with Little Memory and Communication , 2018, COLT 2018.

[22]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[23]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[24]  Adam D. Smith,et al.  Is Interaction Necessary for Distributed Private Learning? , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[25]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[26]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[27]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[28]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[29]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[30]  Marco Gaboardi,et al.  Local Private Hypothesis Testing: Chi-Square Tests , 2017, ICML.

[31]  Himanshu Tyagi,et al.  Test without Trust: Optimal Locally Private Distribution Testing , 2018, AISTATS.

[32]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[33]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[34]  Chunming Qiao,et al.  Mutual Information Optimally Local Private Discrete Distribution Estimation , 2016, ArXiv.

[35]  Yanjun Han,et al.  Geometric Lower Bounds for Distributed Parameter Estimation Under Communication Constraints , 2018, IEEE Transactions on Information Theory.

[36]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[37]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[38]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[39]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[40]  Jerry Li,et al.  Communication-Efficient Distributed Learning of Discrete Distributions , 2017, NIPS.

[41]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[42]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.