Subset Privacy: Draw from an Obfuscated Urn

With the rapidly increasing ability to collect and analyze personal data, data privacy becomes an emerging concern. In this work, we develop a new statistical notion of local privacy to protect each categorical data that will be collected by untrusted entities. The proposed solution, named subset privacy, privatizes the original data value by replacing it with a random subset containing that value. We develop methods for the estimation of distribution functions and independence testing from subset-private data with theoretical guarantees. We also study different mechanisms to realize the subset privacy and evaluation metrics to quantify the amount of privacy in practice. Experimental results on both simulated and real-world datasets demonstrate the encouraging performance of the developed concepts and methods.

[1]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[2]  Carles Padró,et al.  Information Theoretic Security , 2013, Lecture Notes in Computer Science.

[3]  T. Ferguson A Course in Large Sample Theory , 1996 .

[4]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[5]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[8]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[9]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[11]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[12]  Ninghui Li,et al.  Privacy at Scale: Local Dierential Privacy in Practice , 2018 .

[13]  Kai Zhang,et al.  BET on Independence , 2016, Journal of the American Statistical Association.

[14]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[16]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[17]  Meng Sun,et al.  Decentralized Detection With Robust Information Privacy Protection , 2018, IEEE Transactions on Information Forensics and Security.

[18]  Anand D. Sarwate,et al.  A rate-disortion perspective on local differential privacy , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[19]  Hao Wang,et al.  On the Robustness of Information-Theoretic Privacy Measures and Mechanisms , 2018, IEEE Transactions on Information Theory.

[20]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[21]  Xin He,et al.  Towards Information Privacy for the Internet of Things , 2016, ArXiv.

[22]  X. Wang,et al.  Generalized R-squared for detecting dependence , 2016, Biometrika.

[23]  Aleksandra Slavkovic,et al.  Structure and Sensitivity in Differential Privacy: Comparing K-Norm Mechanisms , 2018, Journal of the American Statistical Association.

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Wei Pan,et al.  Extending the Iterative Convex Minorant Algorithm to the Cox Model for Interval Censored Data , 2011 .

[26]  Tianqing Zhu,et al.  Local Differential Privacy and Its Applications: A Comprehensive Survey , 2020, ArXiv.

[27]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[28]  Matthieu R. Bloch,et al.  Wireless Information-Theoretic Security , 2008, IEEE Transactions on Information Theory.

[29]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Mina J. Hanna,et al.  User Data Privacy: Facebook, Cambridge Analytica, and Privacy Protection , 2018, Computer.

[31]  Flávio du Pin Calmon,et al.  Privacy against statistical inference , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Jon A. Wellner,et al.  A Hybrid Algorithm for Computation of the Nonparametric Maximum Likelihood Estimator from Censored Data , 1997 .

[33]  Marco Avella-Medina,et al.  Privacy-Preserving Parametric Inference: A Case for Robust Statistics , 2019, ArXiv.

[34]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[35]  C. Geyer,et al.  Maximum likelihood for interval censored data: Consistency and computation , 1994 .

[36]  David Chaum,et al.  Multiparty unconditionally secure protocols , 1988, STOC '88.

[37]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[38]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[39]  Rickmer Braren,et al.  Secure, privacy-preserving and federated machine learning in medical imaging , 2020, Nature Machine Intelligence.