Collecting and Analyzing Multidimensional Data with Local Differential Privacy

Local differential privacy (LDP) is a recently proposed privacy standard for collecting and analyzing data, which has been used, e.g., in the Chrome browser, iOS and macOS. In LDP, each user perturbs her information locally, and only sends the randomized version to an aggregator who performs analyses, which protects both the users and the aggregator against private information leaks. Although LDP has attracted much research attention in recent years, the majority of existing work focuses on applying LDP to complex data and/or analysis tasks. In this paper, we point out that the fundamental problem of collecting multidimensional data under LDP has not been addressed sufficiently, and there remains much room for improvement even for basic tasks such as computing the mean value over a single numeric attribute under LDP. Motivated by this, we first propose novel LDP mechanisms for collecting a numeric attribute, whose accuracy is at least no worse (and usually better) than existing solutions in terms of worst-case noise variance. Then, we extend these mechanisms to multidimensional data that can contain both numeric and categorical attributes, where our mechanisms always outperform existing solutions regarding worst-case noise variance. As a case study, we apply our solutions to build an LDP-compliant stochastic gradient descent algorithm (SGD), which powers many important machine learning tasks. Experiments using real datasets confirm the effectiveness of our methods, and their advantages over existing solutions.

[1]  Jun Sakuma,et al.  Toward Distribution Estimation under Local Differential Privacy with Small Samples , 2018, Proc. Priv. Enhancing Technol..

[2]  Kobbi Nissim,et al.  Clustering Algorithms for the Centralized and Local Models , 2017, ALT.

[3]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[4]  Benjamin Livshits,et al.  BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model , 2017, USENIX Security Symposium.

[5]  Yin Yang,et al.  PrivTrie: Effective Frequent Term Discovery under Local Differential Privacy , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[6]  Divesh Srivastava,et al.  Marginal Release Under Local Differential Privacy , 2017, SIGMOD Conference.

[7]  Jong Wook Kim,et al.  Application of Local Differential Privacy to Collection of Indoor Positioning Data , 2018, IEEE Access.

[8]  Michael Gastpar,et al.  Locally differentially-private distribution estimation , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[9]  Huanyu Zhang,et al.  Communication Efficient, Sample Optimal, Linear Time Locally Private Discrete Distribution Estimation , 2018, ArXiv.

[10]  Tim Kraska,et al.  PrivateClean: Data Cleaning and Differential Privacy , 2016, SIGMOD Conference.

[11]  Jun Tang,et al.  Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 , 2017, ArXiv.

[12]  Pramod Viswanath,et al.  The Staircase Mechanism in Differential Privacy , 2015, IEEE Journal of Selected Topics in Signal Processing.

[13]  Martin J. Wainwright,et al.  Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation , 2013, NIPS.

[14]  Philip S. Yu,et al.  $\textsf{LoPub}$ : High-Dimensional Crowdsourced Data Publication With Local Differential Privacy , 2016, IEEE Transactions on Information Forensics and Security.

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Hongxia Jin,et al.  Private spatial data aggregation in the local setting , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[17]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[18]  Ninghui Li,et al.  Locally Differentially Private Frequent Itemset Mining , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[19]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[20]  Mikhail Belkin,et al.  Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[21]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[22]  Marco Gaboardi,et al.  Local Private Hypothesis Testing: Chi-Square Tests , 2017, ICML.

[23]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[24]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[25]  Ninghui Li,et al.  Privacy at Scale: Local Dierential Privacy in Practice , 2018 .

[26]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[27]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[28]  Uri Stemmer,et al.  Heavy Hitters and the Structure of Local Privacy , 2017, PODS.

[29]  Josep Domingo-Ferrer,et al.  Optimal data-independent noise for differential privacy , 2013, Inf. Sci..

[30]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[31]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[32]  S. Ruggles Integrated Public Use Microdata Series , 2021, Encyclopedia of Gerontology and Population Aging.

[33]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[34]  Alexander Barg,et al.  Optimal Schemes for Discrete Distribution Estimation Under Locally Differential Privacy , 2017, IEEE Transactions on Information Theory.

[35]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[36]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[37]  Prateek Mittal,et al.  DEEProtect: Enabling Inference-based Access Control on Mobile Sensing Applications , 2017, ArXiv.

[38]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[39]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[40]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[41]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[42]  Ninghui Li,et al.  Locally Differentially Private Heavy Hitter Identification , 2017, IEEE Transactions on Dependable and Secure Computing.