Estimating Sparse Discrete Distributions Under Local Privacy and Communication Constraints

We consider the task of estimating sparse discrete distributions under local differential privacy and communication constraints. Under local privacy constraints, we present a sample-optimal private-coin scheme that only sends a one-bit message per user. For communication constraints, we present a public-coin scheme based on random hashing functions, which we prove is optimal up to logarithmic factors. Our results show that the sample complexity only depends logarithmically on the ambient dimension, thus providing significant improvement in sample complexity under sparsity assumptions. Our lower bounds are based on a recently proposed chi-squared contraction method.

[1]  Yanjun Han,et al.  Geometric Lower Bounds for Distributed Parameter Estimation Under Communication Constraints , 2018, IEEE Transactions on Information Theory.

[2]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[3]  S. Sheather Density Estimation , 2004 .

[4]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[5]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[6]  Yanjun Han,et al.  Learning Distributions from their Samples under Communication Constraints , 2019, ArXiv.

[7]  Himanshu Tyagi,et al.  Interactive Inference Under Information Constraints , 2020, IEEE Transactions on Information Theory.

[8]  Ayfer Özgür,et al.  Fisher Information Under Local Differential Privacy , 2020, IEEE Journal on Selected Areas in Information Theory.

[9]  A. Barg,et al.  Optimal Schemes for Discrete Distribution Estimation Under Locally Differential Privacy , 2017, IEEE Transactions on Information Theory.

[10]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  Jayadev Acharya,et al.  Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters , 2019, ICML.

[12]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[13]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[14]  I. Johnstone,et al.  Minimax risk overlp-balls forlp-error , 1994 .

[15]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[16]  Jianqing Fan,et al.  High-Dimensional Statistics , 2014 .

[17]  Raef Bassily,et al.  Linear Queries Estimation with Local Differential Privacy , 2018, AISTATS.

[18]  J. Kalbfleisch Statistical Inference Under Order Restrictions , 1975 .

[19]  Himanshu Tyagi,et al.  Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction , 2018, IEEE Transactions on Information Theory.

[20]  Di Wang,et al.  On Sparse Linear Regression in the Local Differential Privacy Model , 2019, IEEE Transactions on Information Theory.

[21]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[22]  Kallista A. Bonawitz,et al.  Context-Aware Local Differential Privacy , 2019, ICML.

[23]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[24]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[25]  Yanjun Han,et al.  Distributed Statistical Estimation of High-Dimensional and Nonparametric Distributions , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[26]  John Duchi,et al.  Lower Bounds for Locally Private Estimation via Communication Complexity , 2019, COLT.

[27]  Tengyu Ma,et al.  On Communication Cost of Distributed Statistical Estimation and Dimensionality , 2014, NIPS.

[28]  Martin J. Wainwright,et al.  Distance-based and continuum Fano inequalities with applications to statistical estimation , 2013, ArXiv.

[29]  Takao Murakami,et al.  Utility-Optimized Local Differential Privacy Mechanisms for Distribution Estimation , 2018, USENIX Security Symposium.

[30]  Volkan Cevher,et al.  Sparse projections onto the simplex , 2012, ICML.

[31]  Yanjun Han,et al.  Lower Bounds for Learning Distributions under Communication Constraints via Fisher Information , 2019 .

[32]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[33]  Huanyu Zhang,et al.  Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[34]  Ohad Shamir,et al.  Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.

[35]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[36]  Himanshu Tyagi,et al.  Communication-Constrained Inference and the Role of Shared Randomness , 2019, ICML.