Locally Differentially Private Sparse Vector Aggregation

Vector mean estimation is a central primitive in federated analytics. In vector mean estimation, each user i ∈ [n] holds a real-valued vector vi ∈ [−1, 1], and a server wants to Not only so, we would like to protect each individual user’s privacy. In this paper, we consider the k-sparse version of the vector mean estimation problem, that is, suppose that each user’s vector has at most k non-zero coordinates in its d-dimensional vector, and moreover, k d. In practice, since the universe size d can be very large (e.g., the space of all possible URLs), we would like the per-user communication to be succinct, i.e., independent of or (poly-)logarithmic in the universe size. In this paper, we are the first to show matching upperand lower-bounds for the k-sparse vector mean estimation problem under local differential privacy. Specifically, we construct new mechanisms that achieve asymptotically optimal error as well as succinct communication, either under user-level-LDP or event-level-LDP. We implement our algorithms and evaluate them on synthetic as well as real-world datasets. Our experiments show that we can often achieve one or two orders of magnitude reduction in error in comparison with prior works under typical choices of parameters, while incurring insignificant communication cost.

[1]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[2]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[3]  Thomas Steinke,et al.  The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation , 2021, ICML.

[4]  Yin Yang,et al.  Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy , 2016, ArXiv.

[5]  Jayadev Acharya,et al.  Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters , 2019, ICML.

[6]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[7]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[8]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[9]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Elaine Shi,et al.  Privacy-Preserving Aggregation of Time-Series Data , 2011, NDSS.

[11]  Hui Gao,et al.  Personalized Privacy-Preserving Frequent Itemset Mining Using Randomized Response , 2014, TheScientificWorldJournal.

[12]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[13]  Divesh Srivastava,et al.  Differentially private summaries for sparse data , 2012, ICDT '12.

[14]  Marco Gruteser,et al.  Towards Sparse Federated Analytics: Location Heatmaps under Distributed Differential Privacy with Secure Aggregation , 2021, ArXiv.

[15]  John Duchi,et al.  Lower Bounds for Locally Private Estimation via Communication Complexity , 2019, COLT.

[16]  Divesh Srivastava,et al.  Marginal Release Under Local Differential Privacy , 2017, SIGMOD Conference.

[17]  Jun Zhao,et al.  Local Differential Privacy-Based Federated Learning for Internet of Things , 2020, IEEE Internet of Things Journal.

[18]  Ayfer Özgür,et al.  Breaking the Communication-Privacy-Accuracy Trilemma , 2020, NeurIPS.

[19]  Uri Stemmer,et al.  Heavy Hitters and the Structure of Local Privacy , 2017, PODS.

[20]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[21]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[22]  Elaine Shi,et al.  Privacy-Preserving Stream Aggregation with Fault Tolerance , 2012, Financial Cryptography.

[23]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[24]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[25]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[26]  Anderson C. A. Nascimento,et al.  Practical, Label Private Deep Learning Training based on Secure Multiparty Computation and Differential Privacy , 2021, IACR Cryptol. ePrint Arch..

[27]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[28]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[29]  Omer Reingold,et al.  Computational Differential Privacy , 2009, CRYPTO.

[30]  Ninghui Li,et al.  Estimating Numerical Distributions under Local Differential Privacy , 2019, SIGMOD Conference.

[31]  Ming Li,et al.  PCKV: Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility , 2019, USENIX Security Symposium.

[32]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[33]  Xiaofeng Meng,et al.  PrivKV: Key-Value Data Collection with Local Differential Privacy , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[34]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[35]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[36]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[37]  Ninghui Li,et al.  Locally Differentially Private Frequent Itemset Mining , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[38]  Rasmus Pagh,et al.  Differentially Private Sparse Vectors with Low Error, Optimal Space, and Fast Access , 2021, CCS.

[39]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[40]  Chunming Qiao,et al.  Mutual Information Optimally Local Private Discrete Distribution Estimation , 2016, ArXiv.

[41]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[42]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[43]  Ge Yu,et al.  Collecting and Analyzing Multidimensional Data with Local Differential Privacy , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[44]  Yu-Xiang Wang,et al.  Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising , 2018, ICML.

[45]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.