Mean Estimation with User-level Privacy under Data Heterogeneity

A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data, and provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in the setting we introduce.

[1]  Antonious M. Girgis,et al.  A Generative Framework for Personalized Learning and Estimation: Theory, Algorithms, and Privacy , 2022, ArXiv.

[2]  Alex Kulesza,et al.  Learning with User-Level Privacy , 2021, NeurIPS.

[3]  Felix X. Yu,et al.  Learning discrete distributions: user vs item-level privacy , 2020, NeurIPS.

[4]  Ayfer Özgür,et al.  Breaking the Communication-Privacy-Accuracy Trilemma , 2020, IEEE Transactions on Information Theory.

[5]  Salil Vadhan,et al.  Differentially Private Simple Linear Regression , 2020, Proc. Priv. Enhancing Technol..

[6]  Peter Richtárik,et al.  Federated Learning of a Mixture of Global and Local Models , 2020, ArXiv.

[7]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[8]  Jayadev Acharya,et al.  Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters , 2019, ICML.

[9]  H. B. McMahan,et al.  Semi-Cyclic Stochastic Gradient Descent , 2019, ICML.

[10]  John Duchi,et al.  Lower Bounds for Locally Private Estimation via Communication Complexity , 2019, COLT.

[11]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[12]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[13]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[14]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[15]  Fan Zhou,et al.  On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.

[16]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[17]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18]  Kobbi Nissim,et al.  Simultaneous Private Learning of Multiple Concepts , 2015, ITCS.

[19]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[20]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[21]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[22]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[23]  Frank Nielsen,et al.  Cramer-Rao Lower Bound and Information Geometry , 2013, ArXiv.

[24]  Sanjeev Khanna,et al.  Distributed Private Heavy Hitters , 2012, ICALP.

[25]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[26]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[27]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[28]  B. Sinha,et al.  Statistical Meta-Analysis with Applications , 2008 .

[29]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[30]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[31]  Mahesh S. Patel An introduction to meta-analysis. , 1989, Health policy.

[32]  John C. Duchi,et al.  Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms , 2020, NeurIPS.