User-Level Private Learning via Correlated Sampling

Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample. In this work, we consider the setting where each user holds m samples and the privacy protection is enforced at the level of each user’s data. We show that, in this setting, we may learn with a much fewer number of users. Specifically, we show that, as long as each user receives sufficiently many samples, we can learn any privately learnable class via an (ε, δ)-DP algorithm using only O(log(1/δ)/ε) users. For ε-DP algorithms, we show that we can learn using only Oε(d) users even in the local model, where d is the probabilistic representation dimension. In both cases, we show a nearly-matching lower bound on the number of users required. A crucial component of our results is a generalization of global stability [BLM20] that allows the use of public randomness. Under this relaxed notion, we employ a correlated sampling strategy to show that the global stability can be boosted to be arbitrarily close to one, at a polynomial expense in the number of samples.

[1]  Alex Kulesza,et al.  Learning with User-Level Privacy , 2021, NeurIPS.

[2]  Kobbi Nissim,et al.  Simultaneous Private Learning of Multiple Concepts , 2015, ITCS.

[3]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[5]  Sanjiv Kumar,et al.  Learning discrete distributions: user vs item-level privacy , 2020, NeurIPS.

[6]  Badih Ghazi,et al.  Private Aggregation from Fewer Anonymous Messages , 2019, EUROCRYPT.

[7]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[8]  William K. C. Lam,et al.  Differentially Private SQL with Bounded User Contribution , 2019, Proc. Priv. Enhancing Technol..

[9]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Úlfar Erlingsson,et al.  Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation , 2020, ArXiv.

[11]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[12]  Badih Ghazi,et al.  On the Power of Multiple Anonymous Messages: Frequency Estimation and Selection in the Shuffle Model of Differential Privacy , 2021, EUROCRYPT.

[13]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[14]  Noga Alon,et al.  Private PAC learning implies finite Littlestone dimension , 2018, STOC.

[15]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[16]  Adrià Gascón,et al.  Private Summation in the Multi-Message Shuffle Model , 2020, CCS.

[17]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 2002, JACM.

[18]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[19]  Santosh S. Vempala,et al.  A simple polynomial-time rescaling algorithm for solving linear programs , 2004, STOC '04.

[20]  Yang Song,et al.  Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[21]  Sofya Raskhodnikova,et al.  Analyzing Graphs with Node Differential Privacy , 2013, TCC.

[22]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[23]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[24]  Amos Beimel,et al.  Characterizing the Sample Complexity of Pure Private Learners , 2019, J. Mach. Learn. Res..

[25]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[26]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[27]  Albert Cheu,et al.  Differentially Private Histograms in the Shuffle Model from Fake Users , 2021, ArXiv.

[28]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[29]  Vitaly Feldman,et al.  Locally Private Learning without Interaction Requires Separation , 2019, NeurIPS.

[30]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[31]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[32]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[33]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[34]  Adam D. Smith,et al.  Distributed Differential Privacy via Shuffling , 2018, IACR Cryptol. ePrint Arch..

[35]  Badih Ghazi,et al.  Differentially Private Aggregation in the Shuffle Model: Almost Central Accuracy in Almost a Single Message , 2021, ICML.

[36]  Victor Balcer,et al.  Separating Local & Shuffled Differential Privacy via Histograms , 2020, ITC.

[37]  Badih Ghazi,et al.  Sample-efficient proper PAC learning with approximate differential privacy , 2021, STOC.

[38]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[39]  Badih Ghazi,et al.  Pure Differentially Private Summation from Anonymous Messages , 2020, ITC.

[40]  Noah Golowich,et al.  Differentially Private Nonparametric Regression Under a Growth Condition , 2021, COLT.

[41]  H. Brendan McMahan,et al.  Generative Models for Effective ML on Private, Decentralized Datasets , 2019, ICLR.

[42]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[43]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[44]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[45]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[46]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[47]  Roi Livni,et al.  An Equivalence Between Private Classification and Online Prediction , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[48]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[49]  Russell Impagliazzo,et al.  Reproducibility in learning , 2022, STOC.

[50]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[51]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[52]  Vahab S. Mirrokni,et al.  Smoothly Bounding User Contributions in Differential Privacy , 2020, NeurIPS.

[53]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[54]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[55]  Thomas Holenstein,et al.  Parallel repetition: simplifications and the no-signaling case , 2007, STOC '07.

[56]  Badih Ghazi,et al.  Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead , 2020, ICML.

[57]  Sergei Vassilvitskii,et al.  Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy , 2019, ICML.