Marginal Release Under Local Differential Privacy

Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals.

[1]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[2]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[3]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[4]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[5]  Vitaly Shmatikov,et al.  How To Break Anonymity of the Netflix Prize Dataset , 2006, ArXiv.

[6]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[7]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[8]  Andreas Haeberlen,et al.  Verifiable differential privacy , 2015, EuroSys.

[9]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[10]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[11]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[12]  Hongxia Jin,et al.  Private spatial data aggregation in the local setting , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[13]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[14]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[15]  Ninghui Li,et al.  Locally Differentially Private Protocols for Frequency Estimation , 2017, USENIX Security Symposium.

[16]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[17]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[18]  Pramod Viswanath,et al.  Extremal Mechanisms for Local Differential Privacy , 2014, J. Mach. Learn. Res..

[19]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[20]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[21]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[22]  Thomas G. Dietterich,et al.  Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA , 2001, NIPS.

[23]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[24]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[25]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[26]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[28]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[29]  Arjun Narayan,et al.  Distributed Differential Privacy and Applications , 2015 .

[30]  Yin Yang,et al.  Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy , 2016, ArXiv.

[31]  A. Chaudhuri,et al.  Randomized Response: Theory and Techniques , 1987 .

[32]  Pramod Viswanath,et al.  Differentially Private Multi-party Computation: Optimality of Non-Interactive Randomized Response , 2014, 1407.1546.

[33]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[34]  Ashwin Machanavajjhala,et al.  Differential Privacy in the Wild: A Tutorial on Current Practices & Open Challenges , 2016, Proc. VLDB Endow..

[35]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[36]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[37]  Salil Vadhan,et al.  17 58 v 3 [ cs . D S ] 1 4 M ar 2 01 4 Faster Algorithms for Privately Releasing Marginals ∗ , 2018 .

[38]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[39]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[40]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[41]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[42]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[43]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).