Gaussian Mixture Models for Classification and Hypothesis Tests Under Differential Privacy

Many statistical models are constructed using very basic statistics: mean vectors, variances, and covariances. Gaussian mixture models are such models. When a data set contains sensitive information and cannot be directly released to users, such models can be easily constructed based on noise added query responses. The models nonetheless provide preliminary results to users. Although the queried basic statistics meet the differential privacy guarantee, the complex models constructed using these statistics may not meet the differential privacy guarantee. However it is up to the users to decide how to query a database and how to further utilize the queried results. In this article, our goal is to understand the impact of differential privacy mechanism on Gaussian mixture models. Our approach involves querying basic statistics from a database under differential privacy protection, and using the noise added responses to build classifier and perform hypothesis tests. We discover that adding Laplace noises may have a non-negligible effect on model outputs. For example variance-covariance matrix after noise addition is no longer positive definite. We propose a heuristic algorithm to repair the noise added variance-covariance matrix. We then examine the classification error using the noise added responses, through experiments with both simulated data and real life data, and demonstrate under which conditions the impact of the added noises can be reduced. We compute the exact type I and type II errors under differential privacy for one sample z test, one sample t test, and two sample t test with equal variances. We then show under which condition a hypothesis test returns reliable result given differentially private means, variances and covariances.

[1]  Bhiksha Raj,et al.  Large Margin Gaussian Mixture Models with Differential Privacy , 2012, IEEE Transactions on Dependable and Secure Computing.

[2]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[3]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[4]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[5]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[6]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[7]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[8]  Murat Kantarcioglu,et al.  Mixture of gaussian models and bayes error under differential privacy , 2011, CODASPY '11.

[9]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[10]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[11]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[12]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[13]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[14]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[15]  Maria-Florina Balcan,et al.  Robust Interactive Learning , 2012, COLT.

[16]  Anand D. Sarwate,et al.  A near-optimal algorithm for differentially-private principal components , 2012, J. Mach. Learn. Res..

[17]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[18]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[19]  Kunal Talwar,et al.  On differentially private low rank approximation , 2013, SODA.

[20]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[21]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[22]  A. V. Sriharsha,et al.  On Syntactic Anonymity and Differential Privacy , 2015 .

[23]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[24]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..