Differentially private posterior summaries for linear regression coefficients

In Bayesian regression modeling, often analysts summarize inferences using posterior probabilities and quantiles, such as the posterior probability that a coefficient exceeds zero or the posterior median of that coefficient. However, with potentially unbounded outcomes and explanatory variables, regression inferences based on typical prior distributions can be sensitive to values of individual data points. Thus, releasing posterior summaries of regression coefficients can result in disclosure risks. In this article, we propose some differentially private algorithms for reporting posterior probabilities and posterior quantiles of linear regression coefficients. The algorithms use the general strategy of subsample and aggregate, a technique that requires randomly partitioning the data into disjoint subsets, estimating the regression within each subset, and combining results in ways that satisfy differential privacy.  We illustrate the performance of some of the algorithms using repeated sampling studies. The non-private versions also can be used for Bayesian inference with big data in non-private settings.

[1]  D. Dunson,et al.  Simple, scalable and accurate posterior interval estimation , 2016, 1605.04029.

[2]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[3]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[4]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[5]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[6]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[7]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[8]  Aleksandra B. Slavkovic,et al.  Private Posterior distributions from Variational approximations , 2015, ArXiv.

[9]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[10]  Ashwin Machanavajjhala,et al.  Providing access to confidential research data through synthesis and verification: An application to data on employees of the U.S. federal government , 2017, The Annals of Applied Statistics.

[11]  Anne-Sophie Charest,et al.  How Can We Analyze Differentially-Private Synthetic Datasets? , 2011, J. Priv. Confidentiality.

[12]  Ashwin Machanavajjhala,et al.  Is my model any good: differentially private regression diagnostics , 2017, Knowledge and Information Systems.

[13]  Frank McSherry,et al.  Probabilistic Inference and Differential Privacy , 2010, NIPS.

[14]  Peter D. Hoff,et al.  A First Course in Bayesian Statistical Methods , 2009 .