Utility-Optimized Local Differential Privacy Mechanisms for Distribution Estimation

LDP (Local Differential Privacy) has been widely studied to estimate statistics of personal data (e.g., distribution underlying the data) while protecting users' privacy. Although LDP does not require a trusted third party, it regards all personal data equally sensitive, which causes excessive obfuscation hence the loss of utility. In this paper, we introduce the notion of ULDP (Utility-optimized LDP), which provides a privacy guarantee equivalent to LDP only for sensitive data. We first consider the setting where all users use the same obfuscation mechanism, and propose two mechanisms providing ULDP: utility-optimized randomized response and utility-optimized RAPPOR. We then consider the setting where the distinction between sensitive and non-sensitive data can be different from user to user. For this setting, we propose a personalized ULDP mechanism with semantic tags to estimate the distribution of personal data with high utility while keeping secret what is sensitive for each user. We show theoretically and experimentally that our mechanisms provide much higher utility than the existing LDP mechanisms when there are a lot of non-sensitive data. We also show that when most of the data are non-sensitive, our mechanisms even provide almost the same utility as non-private mechanisms in the low privacy regime.

[1]  Alexander Barg,et al.  Optimal Schemes for Discrete Distribution Estimation Under Locally Differential Privacy , 2017, IEEE Transactions on Information Theory.

[2]  Catuscia Palamidessi,et al.  Broadening the Scope of Differential Privacy Using Metrics , 2013, Privacy Enhancing Technologies.

[3]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[4]  Ashwin Machanavajjhala,et al.  One-sided Differential Privacy , 2017, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[5]  Catuscia Palamidessi,et al.  Optimal Geo-Indistinguishable Mechanisms for Location Privacy , 2014, CCS.

[6]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[7]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[8]  Xing Xie,et al.  GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory , 2010, IEEE Data Eng. Bull..

[9]  Ting Yu,et al.  Conservative or liberal? Personalized differential privacy , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10]  Akihiko Ohsuga,et al.  Differential Private Data Collection and Analysis Based on Randomized Multiple Dummies for Untrusted Mobile Crowdsensing , 2017, IEEE Transactions on Information Forensics and Security.

[11]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[12]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[13]  Michael Gastpar,et al.  Locally differentially-private distribution estimation , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[14]  Tim Kraska,et al.  PrivateClean: Data Cleaning and Differential Privacy , 2016, SIGMOD Conference.

[15]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Yin Yang,et al.  PrivTrie: Effective Frequent Term Discovery under Local Differential Privacy , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[17]  Philippe Golle,et al.  On the Anonymity of Home/Work Location Pairs , 2009, Pervasive.

[18]  Catuscia Palamidessi,et al.  Geo-indistinguishability: differential privacy for location-based systems , 2012, CCS.

[19]  N. S. Mangat,et al.  An Improved Randomized Response Strategy , 1994 .

[20]  Michael I. Jordan,et al.  A ug 2 01 4 Local Privacy , Data Processing Inequalities , and Minimax Rates , 2018 .

[21]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[22]  Daqing Zhang,et al.  Modeling User Activity Preference by Leveraging User Spatial Temporal Characteristics in LBSNs , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[23]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[24]  Paul W. Cuff,et al.  Differential Privacy as a Mutual Information Constraint , 2016, CCS.

[25]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[26]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[27]  Jun Sakuma,et al.  Toward Distribution Estimation under Local Differential Privacy with Small Samples , 2018, Proc. Priv. Enhancing Technol..

[28]  Hiroshi Nakagawa,et al.  Bayesian Differential Privacy on Correlated Data , 2015, SIGMOD Conference.

[29]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[30]  Benjamin Livshits,et al.  BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model , 2017, USENIX Security Symposium.

[31]  Takao Murakami,et al.  Differentially Private Obfuscation Mechanisms for Hiding Probability Distributions , 2019, ESORICS.

[32]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[33]  Xi Chen,et al.  On Bayes Risk Lower Bounds , 2014, J. Mach. Learn. Res..

[34]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[35]  Yizhen Wang,et al.  Pufferfish Privacy Mechanisms for Correlated Data , 2016, SIGMOD Conference.

[36]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[37]  Reza Shokri,et al.  Privacy Games: Optimal User-Centric Data Obfuscation , 2014, Proc. Priv. Enhancing Technol..

[38]  Ashwin Machanavajjhala,et al.  Pufferfish , 2014, ACM Trans. Database Syst..

[39]  Anne-Marie Kermarrec,et al.  Heterogeneous Differential Privacy , 2015, J. Priv. Confidentiality.

[40]  Prateek Mittal,et al.  Dependence Makes You Vulnberable: Differential Privacy Under Dependent Tuples , 2016, NDSS.

[41]  Andreas Winkelbauer,et al.  Moments and Absolute Moments of the Normal Distribution , 2012, ArXiv.

[42]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[43]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[44]  Divesh Srivastava,et al.  Marginal Release Under Local Differential Privacy , 2017, SIGMOD Conference.

[45]  Daqing Zhang,et al.  Participatory Cultural Mapping Based on Collective Behavior Data in Location-Based Social Networks , 2016, ACM Trans. Intell. Syst. Technol..

[46]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[47]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[48]  Matthias Grossglauser,et al.  CRAWDAD dataset epfl/mobility (v.2009-02-24) , 2009 .

[49]  Carmela Troncoso,et al.  Back to the Drawing Board: Revisiting the Design of Optimal Location Privacy-preserving Mechanisms , 2017, CCS.

[50]  Wenliang Du,et al.  OptRR: Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[51]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[52]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.