RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Rory A. Fisher,et al.  Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Michael J. Klass,et al.  The Multidimensional Central Limit Theorem for Arrays Normed by Affine Transformations , 1981 .

[5]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[6]  Jalaj Upadhyay,et al.  Randomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy and Compressed Sensing , 2014, 1410.2470.

[7]  Sébastien Gambs,et al.  Challenging Differential Privacy: The Case of Non-interactive Mechanisms , 2014, ESORICS.

[8]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[9]  Joshua Snoke,et al.  Privacy Protection for Natural Language Records: Neural Generative Models for Releasing Synthetic Twitter Data , 2016 .

[10]  Haoran Li,et al.  DPCube: Differentially Private Histogram Release through Multidimensional Partitioning , 2014, Trans. Data Priv..

[11]  Yin Yang,et al.  Compressive mechanism: utilizing sparse representation in differential privacy , 2011, WPES.

[12]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[13]  Hyeran Byun,et al.  Applications of Support Vector Machines for Pattern Recognition: A Survey , 2002, SVM.

[14]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[15]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[16]  Emiliano De Cristofaro,et al.  Differentially Private Mixture of Generative Neural Networks , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[17]  Andreas Keller,et al.  Privacy in Epigenetics: Temporal Linkability of MicroRNA Expression Profiles , 2016, USENIX Security Symposium.

[18]  Ninghui Li,et al.  Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[19]  Paul W. Cuff,et al.  Differential Privacy as a Mutual Information Constraint , 2016, CCS.

[20]  Elizabeth Meckes,et al.  Approximation of Projections of Random Vectors , 2009, 0912.2044.

[21]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, Allerton.

[22]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[23]  Ben Y. Zhao,et al.  Sharing graphs using differentially private graph models , 2011, IMC '11.

[24]  Andreas Buja,et al.  Interactive High-Dimensional Data Visualization , 1996 .

[25]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[26]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[27]  Joseph Bonneau,et al.  Differentially Private Password Frequency Lists , 2016, NDSS.

[28]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[29]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[30]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[31]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[32]  J. Gram Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate. , 1883 .

[33]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[34]  Andrew McGregor,et al.  The matrix mechanism: optimizing linear counting queries under differential privacy , 2015, The VLDB Journal.

[35]  Elizabeth S. Meckes,et al.  Projections of Probability Distributions: A Measure-Theoretic Dvoretzky Theorem , 2011, 1102.3438.

[36]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[37]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[38]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[39]  Sarajane Marques Peres,et al.  Grammatical Facial Expressions Recognition with Machine Learning , 2014, FLAIRS Conference.

[40]  G. Box Science and Statistics , 1976 .

[41]  Avrim Blum,et al.  The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[42]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[43]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[44]  Nina Mishra,et al.  Privacy via the Johnson-Lindenstrauss Transform , 2012, J. Priv. Confidentiality.

[45]  Jun Zhang 1 PrivBayes : Private Data Release via Bayesian Networks , 2017 .

[46]  E. Schmidt Zur Theorie der linearen und nichtlinearen Integralgleichungen , 1907 .

[47]  Xiaoqian Jiang,et al.  Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions , 2014, EDBT.

[48]  Jalaj Upadhyay,et al.  Circulant Matrices and Differential Privacy , 2014, IACR Cryptol. ePrint Arch..

[49]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[50]  Fang Liu,et al.  Model-based Differentially Private Data Synthesis and Statistical Inference in Multiple Synthetic Datasets , 2016, Trans. Data Priv..

[51]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[52]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[53]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[54]  Larry A. Wasserman,et al.  Differential privacy for functions and functional data , 2012, J. Mach. Learn. Res..

[55]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[56]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[57]  Saikat Guha,et al.  Koi: A Location-Privacy Platform for Smartphone Apps , 2012, NSDI.

[58]  Prateek Mittal,et al.  LinkMirage: Enabling Privacy-preserving Analytics on Social Relationships , 2016, NDSS.

[59]  Carl A. Gunter,et al.  Plausible Deniability for Privacy-Preserving Data Synthesis , 2017, Proc. VLDB Endow..

[60]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[61]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[62]  Jerome P. Reiter,et al.  Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data , 2012, Trans. Data Priv..

[63]  S. Kung Kernel Methods and Machine Learning , 2014 .

[64]  Sharon Goldberg,et al.  Calibrating Data to Sensitivity in Private Data Analysis , 2012, Proc. VLDB Endow..

[65]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[66]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[67]  Trevor Hastie,et al.  The elements of statistical learning. 2001 , 2001 .

[68]  M. Köppen,et al.  The Curse of Dimensionality , 2010 .

[69]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[70]  Frank McSherry,et al.  Probabilistic Inference and Differential Privacy , 2010, NIPS.

[71]  Andrew Beng Jin Teoh,et al.  Biometric hash: high-confidence face recognition , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[72]  Claude Castelluccia,et al.  Differentially Private Histogram Publishing through Lossy Compression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[73]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[74]  Larry A. Wasserman,et al.  Differential privacy with compression , 2009, 2009 IEEE International Symposium on Information Theory.

[75]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[76]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[77]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[78]  Jonathan Ullman,et al.  PCPs and the Hardness of Generating Private Synthetic Data , 2011, TCC.

[79]  L. Hubert,et al.  Comparing partitions , 1985 .

[80]  Rocco A. Servedio,et al.  Private data release via learning thresholds , 2011, SODA.

[81]  Ignacio Rojas,et al.  Design, implementation and validation of a novel open framework for agile development of mobile health applications , 2015, BioMedical Engineering OnLine.

[82]  Ninghui Li,et al.  Understanding Hierarchical Methods for Differentially Private Histograms , 2013, Proc. VLDB Endow..

[83]  Ker-Chau Li,et al.  On almost Linearity of Low Dimensional Projections from High Dimensional Data , 1993 .

[84]  François Kawala,et al.  Prédictions d'activité dans les réseaux sociaux en ligne , 2013 .

[85]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[86]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[87]  Héctor Pomares,et al.  A benchmark dataset to evaluate sensor displacement in activity recognition , 2012, UbiComp.

[88]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[89]  Vitaly Shmatikov,et al.  "You Might Also Like:" Privacy Risks of Collaborative Filtering , 2011, 2011 IEEE Symposium on Security and Privacy.

[90]  Zhicong Huang,et al.  Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies , 2015, CCS.

[91]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[92]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[93]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[94]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[95]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[96]  Li Xiong,et al.  Protecting Locations with Differential Privacy under Temporal Correlations , 2014, CCS.

[97]  Fabian Prasser,et al.  SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees , 2018, Proc. Priv. Enhancing Technol..

[98]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[99]  Claire McKay Bowen,et al.  Differentially Private Data Synthesis Methods , 2016 .

[100]  Ashwin Machanavajjhala,et al.  Principled Evaluation of Differentially Private Algorithms using DPBench , 2015, SIGMOD Conference.

[101]  Günter Rote,et al.  A New Metric Between Polygons and How to Compute it , 1992, ICALP.

[102]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[103]  David Leoni,et al.  Non-interactive differential privacy: a survey , 2012, WOD.

[104]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[105]  D. Freedman,et al.  Asymptotics of Graphical Projection Pursuit , 1984 .

[106]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[107]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[108]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[109]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[110]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[111]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[112]  R. Rojas Why the Normal Distribution ? , 2010 .

[113]  Mitchell H. Tsai,et al.  The Curse of Dimensionality. , 2018, Anesthesiology.

[114]  Ju Ren,et al.  DPPro: Differentially Private High-Dimensional Data Release via Random Projection , 2017, IEEE Transactions on Information Forensics and Security.

[115]  Charles Elkan,et al.  Differential Privacy and Machine Learning: a Survey and Review , 2014, ArXiv.

[116]  Li Zhang,et al.  Analyze gauss: optimal bounds for privacy-preserving principal component analysis , 2014, STOC.

[117]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[118]  Anand D. Sarwate,et al.  Near-optimal Differentially Private Principal Components , 2012, NIPS.

[119]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[120]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[121]  Yue Wang,et al.  A Data- and Workload-Aware Algorithm for Range Queries Under Differential Privacy , 2014, ArXiv.

[122]  Yin Yang,et al.  Differentially Private Histogram Publication , 2012, ICDE.

[123]  Ninghui Li,et al.  Differentially Private Publishing of High-dimensional Data Using Sensitivity Control , 2015, AsiaCCS.

[124]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[125]  Jianliang Xu,et al.  Towards Accurate Histogram Publication under Differential Privacy , 2014, SDM.

[126]  Xiaoqian Jiang,et al.  Differential-Private Data Publishing Through Component Analysis , 2013, Trans. Data Priv..