论文信息 - RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

[1] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2] Rory A. Fisher,et al. Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[3] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4] Michael J. Klass,et al. The Multidimensional Central Limit Theorem for Arrays Normed by Affine Transformations , 1981 .

[5] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[6] Jalaj Upadhyay,et al. Randomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy and Compressed Sensing , 2014, 1410.2470.

[7] Sébastien Gambs,et al. Challenging Differential Privacy: The Case of Non-interactive Mechanisms , 2014, ESORICS.

[8] Ali Farhadi,et al. Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[9] Joshua Snoke,et al. Privacy Protection for Natural Language Records: Neural Generative Models for Releasing Synthetic Twitter Data , 2016 .

[10] Haoran Li,et al. DPCube: Differentially Private Histogram Release through Multidimensional Partitioning , 2014, Trans. Data Priv..

[11] Yin Yang,et al. Compressive mechanism: utilizing sparse representation in differential privacy , 2011, WPES.

[12] Katrina Ligett,et al. A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[13] Hyeran Byun,et al. Applications of Support Vector Machines for Pattern Recognition: A Survey , 2002, SVM.

[14] L. Wasserman,et al. A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[15] Gregory Piatetsky-Shapiro,et al. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[16] Emiliano De Cristofaro,et al. Differentially Private Mixture of Generative Neural Networks , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[17] Andreas Keller,et al. Privacy in Epigenetics: Temporal Linkability of MicroRNA Expression Profiles , 2016, USENIX Security Symposium.

[18] Ninghui Li,et al. Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[19] Paul W. Cuff,et al. Differential Privacy as a Mutual Information Constraint , 2016, CCS.

[20] Elizabeth Meckes,et al. Approximation of Projections of Random Vectors , 2009, 0912.2044.

[21] Vitaly Shmatikov,et al. Privacy-preserving deep learning , 2015, Allerton.

[22] Cynthia Dwork,et al. Practical privacy: the SuLQ framework , 2005, PODS.

[23] Ben Y. Zhao,et al. Sharing graphs using differentially private graph models , 2011, IMC '11.

[24] Andreas Buja,et al. Interactive High-Dimensional Data Visualization , 1996 .

[25] Tong Zhang,et al. Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[26] Cynthia Dwork,et al. Differential privacy and robust statistics , 2009, STOC '09.

[27] Joseph Bonneau,et al. Differentially Private Password Frequency Lists , 2016, NDSS.

[28] Nicholas G. Polson,et al. Data augmentation for support vector machines , 2011 .

[29] Cao Feng,et al. STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[30] Stuart Barber,et al. All of Statistics: a Concise Course in Statistical Inference , 2005 .

[31] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[32] J. Gram. Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate. , 1883 .

[33] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[34] Andrew McGregor,et al. The matrix mechanism: optimizing linear counting queries under differential privacy , 2015, The VLDB Journal.

[35] Elizabeth S. Meckes,et al. Projections of Probability Distributions: A Measure-Theoretic Dvoretzky Theorem , 2011, 1102.3438.

[36] Ilya Mironov,et al. Differentially private recommender systems: building privacy into the net , 2009, KDD.

[37] P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[38] Cynthia Dwork,et al. Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[39] Sarajane Marques Peres,et al. Grammatical Facial Expressions Recognition with Machine Learning , 2014, FLAIRS Conference.

[40] G. Box. Science and Statistics , 1976 .

[41] Avrim Blum,et al. The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[42] Carl D. Meyer,et al. Matrix Analysis and Applied Linear Algebra , 2000 .

[43] Moni Naor,et al. On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[44] Nina Mishra,et al. Privacy via the Johnson-Lindenstrauss Transform , 2012, J. Priv. Confidentiality.

[45] Jun Zhang. 1 PrivBayes : Private Data Release via Bayesian Networks , 2017 .

[46] E. Schmidt. Zur Theorie der linearen und nichtlinearen Integralgleichungen , 1907 .

[47] Xiaoqian Jiang,et al. Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions , 2014, EDBT.

[48] Jalaj Upadhyay,et al. Circulant Matrices and Differential Privacy , 2014, IACR Cryptol. ePrint Arch..

[49] B. Ripley,et al. Pattern Recognition , 1968, Nature.

[50] Fang Liu,et al. Model-based Differentially Private Data Synthesis and Statistical Inference in Multiple Synthetic Datasets , 2016, Trans. Data Priv..

[51] Guy N. Rothblum,et al. A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[52] Jonathan Ullman,et al. Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[53] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[54] Larry A. Wasserman,et al. Differential privacy for functions and functional data , 2012, J. Mach. Learn. Res..

[55] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[56] D. Broomhead,et al. Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[57] Saikat Guha,et al. Koi: A Location-Privacy Platform for Smartphone Apps , 2012, NSDI.

[58] Prateek Mittal,et al. LinkMirage: Enabling Privacy-preserving Analytics on Social Relationships , 2016, NDSS.

[59] Carl A. Gunter,et al. Plausible Deniability for Privacy-Preserving Data Synthesis , 2017, Proc. VLDB Endow..

[60] Aaron Roth,et al. A learning theory approach to non-interactive database privacy , 2008, STOC.

[61] Kunal Talwar,et al. Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[62] Jerome P. Reiter,et al. Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data , 2012, Trans. Data Priv..

[63] S. Kung. Kernel Methods and Machine Learning , 2014 .

[64] Sharon Goldberg,et al. Calibrating Data to Sensitivity in Private Data Analysis , 2012, Proc. VLDB Endow..

[65] Dawn Xiaodong Song,et al. On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[66] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[67] Trevor Hastie,et al. The elements of statistical learning. 2001 , 2001 .

[68] M. Köppen,et al. The Curse of Dimensionality , 2010 .

[69] Divesh Srivastava,et al. Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[70] Frank McSherry,et al. Probabilistic Inference and Differential Privacy , 2010, NIPS.

[71] Andrew Beng Jin Teoh,et al. Biometric hash: high-confidence face recognition , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[72] Claude Castelluccia,et al. Differentially Private Histogram Publishing through Lossy Compression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[73] Aaron Roth,et al. Iterative Constructions and Private Data Release , 2011, TCC.

[74] Larry A. Wasserman,et al. Differential privacy with compression , 2009, 2009 IEEE International Symposium on Information Theory.

[75] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[76] Dan Suciu,et al. Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[77] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[78] Jonathan Ullman,et al. PCPs and the Hardness of Generating Private Synthetic Data , 2011, TCC.

[79] L. Hubert,et al. Comparing partitions , 1985 .

[80] Rocco A. Servedio,et al. Private data release via learning thresholds , 2011, SODA.

[81] Ignacio Rojas,et al. Design, implementation and validation of a novel open framework for agile development of mobile health applications , 2015, BioMedical Engineering OnLine.

[82] Ninghui Li,et al. Understanding Hierarchical Methods for Differentially Private Histograms , 2013, Proc. VLDB Endow..

[83] Ker-Chau Li,et al. On almost Linearity of Low Dimensional Projections from High Dimensional Data , 1993 .

[84] François Kawala,et al. Prédictions d'activité dans les réseaux sociaux en ligne , 2013 .

[85] Leonhard Held,et al. Gaussian Markov Random Fields: Theory and Applications , 2005 .

[86] Cynthia Dwork,et al. Differential Privacy , 2006, ICALP.

[87] Héctor Pomares,et al. A benchmark dataset to evaluate sensor displacement in activity recognition , 2012, UbiComp.

[88] Philip S. Yu,et al. Differentially private data release for data mining , 2011, KDD.

[89] Vitaly Shmatikov,et al. "You Might Also Like:" Privacy Risks of Collaborative Filtering , 2011, 2011 IEEE Symposium on Security and Privacy.

[90] Zhicong Huang,et al. Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies , 2015, CCS.

[91] Frank McSherry,et al. Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[92] Anand D. Sarwate,et al. Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[93] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.

[94] Moni Naor,et al. Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[95] Julia Hirschberg,et al. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[96] Li Xiong,et al. Protecting Locations with Differential Privacy under Temporal Correlations , 2014, CCS.

[97] Fabian Prasser,et al. SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees , 2018, Proc. Priv. Enhancing Technol..

[98] A. Atiya,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[99] Claire McKay Bowen,et al. Differentially Private Data Synthesis Methods , 2016 .

[100] Ashwin Machanavajjhala,et al. Principled Evaluation of Differentially Private Algorithms using DPBench , 2015, SIGMOD Conference.

[101] Günter Rote,et al. A New Metric Between Polygons and How to Compute it , 1992, ICALP.

[102] Úlfar Erlingsson,et al. Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[103] David Leoni,et al. Non-interactive differential privacy: a survey , 2012, WOD.

[104] Vladimir Vapnik,et al. Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[105] D. Freedman,et al. Asymptotics of Graphical Projection Pursuit , 1984 .

[106] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[107] Massimo Barbaro,et al. A Face Is Exposed for AOL Searcher No , 2006 .

[108] Cynthia Dwork,et al. Differential Privacy: A Survey of Results , 2008, TAMC.

[109] C. Stein,et al. Estimation with Quadratic Loss , 1992 .

[110] Úlfar Erlingsson,et al. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[111] Andreas Haeberlen,et al. Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[112] R. Rojas. Why the Normal Distribution ? , 2010 .

[113] Mitchell H. Tsai,et al. The Curse of Dimensionality. , 2018, Anesthesiology.

[114] Ju Ren,et al. DPPro: Differentially Private High-Dimensional Data Release via Random Projection , 2017, IEEE Transactions on Information Forensics and Security.

[115] Charles Elkan,et al. Differential Privacy and Machine Learning: a Survey and Review , 2014, ArXiv.

[116] Li Zhang,et al. Analyze gauss: optimal bounds for privacy-preserving principal component analysis , 2014, STOC.

[117] Ashwin Machanavajjhala,et al. Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[118] Anand D. Sarwate,et al. Near-optimal Differentially Private Principal Components , 2012, NIPS.

[119] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .

[120] Sofya Raskhodnikova,et al. Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[121] Yue Wang,et al. A Data- and Workload-Aware Algorithm for Range Queries Under Differential Privacy , 2014, ArXiv.

[122] Yin Yang,et al. Differentially Private Histogram Publication , 2012, ICDE.

[123] Ninghui Li,et al. Differentially Private Publishing of High-dimensional Data Using Sensitivity Control , 2015, AsiaCCS.

[124] H. Hotelling. Analysis of a complex of statistical variables into principal components. , 1933 .

[125] Jianliang Xu,et al. Towards Accurate Histogram Publication under Differential Privacy , 2014, SDM.

[126] Xiaoqian Jiang,et al. Differential-Private Data Publishing Through Component Analysis , 2013, Trans. Data Priv..