On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis

Bayesian inference has great promise for the privacy-preserving analysis of sensitive data, as posterior sampling automatically preserves differential privacy, an algorithmic notion of data privacy, under certain conditions (Dimitrakakis et al., 2014; Wang et al., 2015). While this one posterior sample (OPS) approach elegantly provides privacy "for free," it is data inefficient in the sense of asymptotic relative efficiency (ARE). We show that a simple alternative based on the Laplace mechanism, the workhorse of differential privacy, is as asymptotically efficient as non-private posterior inference, under general assumptions. This technique also has practical advantages including efficient use of the privacy budget for MCMC. We demonstrate the practicality of our approach on a time-series analysis of sensitive military records from the Afghanistan and Iraq wars disclosed by the Wikileaks organization.

[1]  Christos Dimitrakakis,et al.  Robust and Private Bayesian Inference , 2013, ALT.

[2]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[3]  Ninghui Li,et al.  On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy , 2011, ASIACCS '12.

[4]  Stephen J. Roberts,et al.  Probabilistic Modeling in Bioinformatics and Medical Informatics , 2010 .

[5]  Justin Reich,et al.  Privacy, anonymity, and big data in the social sciences , 2014, Commun. ACM.

[6]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[7]  Aarti Singh,et al.  Differentially private subspace clustering , 2015, NIPS.

[8]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[9]  Zhenghao Chen,et al.  Tuned Models of Peer Assessment in MOOCs , 2013, EDM.

[10]  Z. Geng,et al.  Statistical Inference for Truncated Dirichlet Distribution and Its Application in Misclassification , 2000 .

[11]  Stephen T. Joy The Differential Privacy of Bayesian Inference , 2015 .

[12]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[13]  Alexander J. Smola,et al.  Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[16]  Sampath Kannan,et al.  The Exponential Mechanism for Social Welfare: Private, Truthful, and Nearly Optimal , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[17]  L. Brown Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .

[18]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[19]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[20]  Stephen Tyree,et al.  Learning with Marginalized Corrupted Features , 2013, ICML.

[21]  L. Tierney,et al.  The validity of posterior expansions based on Laplace''s method , 1990 .

[22]  Xinjia Chen,et al.  A New Generalization of Chebyshev Inequality for Random Vectors , 2007, ArXiv.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[25]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..