The Power of the Hybrid Model for Mean Estimation

Abstract We explore the power of the hybrid model of differential privacy (DP), in which some users desire the guarantees of the local model of DP and others are content with receiving the trusted-curator model guarantees. In particular, we study the utility of hybrid model estimators that compute the mean of arbitrary realvalued distributions with bounded support. When the curator knows the distribution’s variance, we design a hybrid estimator that, for realistic datasets and parameter settings, achieves a constant factor improvement over natural baselines.We then analytically characterize how the estimator’s utility is parameterized by the problem setting and parameter choices. When the distribution’s variance is unknown, we design a heuristic hybrid estimator and analyze how it compares to the baselines. We find that it often performs better than the baselines, and sometimes almost as well as the known-variance estimator. We then answer the question of how our estimator’s utility is affected when users’ data are not drawn from the same distribution, but rather from distributions dependent on their trust model preference. Concretely, we examine the implications of the two groups’ distributions diverging and show that in some cases, our estimators maintain fairly high utility. We then demonstrate how our hybrid estimator can be incorporated as a sub-component in more complex, higher-dimensional applications. Finally, we propose a new privacy amplification notion for the hybrid model that emerges due to interaction between the groups, and derive corresponding amplification results for our hybrid estimators.

[1]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[2]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[3]  P. Billingsley,et al.  Probability and Measure , 1980 .

[4]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[5]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[6]  W. Reed The Normal-Laplace Distribution and Its Relatives , 2006 .

[7]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[8]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[9]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[10]  C. Dwork A firm foundation for private data analysis , 2011, Commun. ACM.

[11]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[12]  Michael I. Jordan,et al.  Local Privacy and Statistical Minimax Rates , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[13]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[14]  Charles Elkan,et al.  Differential privacy based on importance weighting , 2013, Machine Learning.

[15]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[16]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[17]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[18]  Mikhail Belkin,et al.  Learning privately from multiparty data , 2016, ICML.

[19]  Elisa Bertino,et al.  Differentially Private K-Means Clustering , 2015, CODASPY.

[20]  Roksana Boreli,et al.  K-variates++: More Pluses in the K-means++ , 2016, ICML.

[21]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[22]  Anand D. Sarwate,et al.  Randomized requantization with local differential privacy , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Artem Barger,et al.  k-Means for Streaming and Distributed Big Sparse Data , 2015, SDM.

[24]  Maria-Florina Balcan,et al.  Differentially Private Clustering in High-Dimensional Euclidean Spaces , 2017, ICML.

[25]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[26]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[27]  Benjamin Livshits,et al.  BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model , 2017, USENIX Security Symposium.

[28]  D. Rajan Probability, Random Variables, and Stochastic Processes , 2017 .

[29]  Vitaly Feldman Dealing with Range Anxiety in Mean Estimation via Statistical Queries , 2017, ALT.

[30]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[31]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[32]  Kobbi Nissim,et al.  Clustering Algorithms for the Centralized and Local Models , 2017, ALT.

[33]  Huanyu Zhang,et al.  INSPECTRE: Privately Estimating the Unseen , 2018, ICML.

[34]  Badih Ghazi,et al.  On the Power of Multiple Anonymous Messages , 2019, IACR Cryptol. ePrint Arch..

[35]  Borja Balle,et al.  The Privacy Blanket of the Shuffle Model , 2019, CRYPTO.

[36]  Janardhan Kulkarni,et al.  Locally Private Gaussian Estimation , 2018, NeurIPS.

[37]  Thomas Steinke,et al.  Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation , 2019, NeurIPS.

[38]  Adam D. Smith,et al.  Distributed Differential Privacy via Shuffling , 2018, IACR Cryptol. ePrint Arch..

[39]  Marco Gaboardi,et al.  Locally Private Mean Estimation: Z-test and Tight Confidence Intervals , 2018, AISTATS.

[40]  Jun Zhao,et al.  Distributed Clustering in the Anonymized Space with Local Differential Privacy , 2019, ArXiv.

[41]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[42]  Noga Alon,et al.  Limits of Private Learning with Access to Public Data , 2019, NeurIPS.

[43]  Borja Balle,et al.  Differentially Private Summation with Multi-Message Shuffling , 2019, ArXiv.

[44]  Badih Ghazi,et al.  Scalable and Differentially Private Distributed Aggregation in the Shuffled Model , 2019, ArXiv.

[45]  Jerry Li,et al.  Privately Learning High-Dimensional Distributions , 2018, COLT.

[46]  Badih Ghazi,et al.  Private Aggregation from Fewer Anonymous Messages , 2019, EUROCRYPT.

[47]  Uri Stemmer Locally Private k-Means Clustering , 2020, SODA.

[48]  Jonathan Ullman,et al.  Private Mean Estimation of Heavy-Tailed Distributions , 2020, COLT.

[49]  Sheng Zhong,et al.  Distributed K-Means clustering guaranteeing local differential privacy , 2020, Comput. Secur..

[50]  Andrew Bray,et al.  Differentially Private Confidence Intervals , 2020, ArXiv.

[51]  Jonathan Ullman,et al.  Differentially Private Algorithms for Learning Mixtures of Separated Gaussians , 2019, 2020 Information Theory and Applications Workshop (ITA).

[52]  Amos Beimel,et al.  The power of synergy in differential privacy: Combining a small curator with local randomizers , 2019, ITC.

[53]  Zhigang Lu,et al.  Differentially Private $k$k-Means Clustering With Convergence Guarantee , 2020, IEEE Transactions on Dependable and Secure Computing.

[54]  Adrià Gascón,et al.  Private Summation in the Multi-Message Shuffle Model , 2020, CCS.

[55]  Badih Ghazi,et al.  Pure Differentially Private Summation from Anonymous Messages , 2020, ITC.

[56]  Jonathan Ullman,et al.  CoinPress: Practical Private Mean and Covariance Estimation , 2020, NeurIPS.

[57]  Raef Bassily,et al.  Private Query Release Assisted by Public Data , 2020, ICML.

[58]  Victor Balcer,et al.  Separating Local & Shuffled Differential Privacy via Histograms , 2019, ITC.

[59]  Jinhui Xu,et al.  Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data , 2019, ALT.