Differentially private methods for managing model uncertainty in linear regression models

In this work, we propose differentially private methods for hypothesis testing, model averaging, and model selection for normal linear models. We consider Bayesian methods based on mixtures of $g$-priors and non-Bayesian methods based on likelihood-ratio statistics and information criteria. The procedures are asymptotically consistent and straightforward to implement with existing software. We focus on practical issues such as adjusting critical values so that hypothesis tests have adequate type I error rates and quantifying the uncertainty introduced by the privacy-ensuring mechanisms.

[1]  James Honaker,et al.  Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential Privacy , 2021, Statistica Sinica.

[2]  Xueying Chen,et al.  Divide-and-conquer methods for big data analysis , 2021, Wiley StatsRef: Statistics Reference Online.

[3]  Tejas D. Kulkarni,et al.  Differentially Private Bayesian Inference for Generalized Linear Models , 2020, ICML.

[4]  D. Sheldon,et al.  Parametric Bootstrap for Differentially Private Confidence Intervals , 2020, AISTATS.

[5]  Daniel Sheldon,et al.  General-Purpose Differentially-Private Confidence Intervals , 2020, ArXiv.

[6]  Maria Eugenia Castellanos,et al.  A Model Selection Approach for Variable Selection with Censored Data , 2020, Bayesian Analysis.

[7]  M. J. Bayarri,et al.  On the prevalence of information inconsistency in normal linear models , 2020 .

[8]  Merlise Clyde,et al.  Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling [R package BAS version 1.5.5] , 2020 .

[9]  Marco Avella-Medina Privacy-Preserving Parametric Inference: A Case for Robust Statistics , 2019, Journal of the American Statistical Association.

[10]  Daniel Sheldon,et al.  Differentially Private Bayesian Linear Regression , 2019, NeurIPS.

[11]  David Rossell,et al.  Additive Bayesian Variable Selection under Censoring and Misspecification , 2019, Statistical Science.

[12]  Or Sheffet,et al.  Old Techniques in Differentially Private Linear Regression , 2019, International Conference on Algorithmic Learning Theory.

[13]  Jerome P. Reiter,et al.  Differentially private posterior summaries for linear regression coefficients , 2018, J. Priv. Confidentiality.

[14]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[15]  Yu-Xiang Wang,et al.  Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising , 2018, ICML.

[16]  Simson L. Garfinkel,et al.  Issues Encountered Deploying Differential Privacy , 2018, WPES@CCS.

[17]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[18]  Ryan P. Adams,et al.  PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference , 2017, NIPS.

[19]  R. Tibshirani,et al.  Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso , 2017, 1707.08692.

[20]  Ashwin Machanavajjhala,et al.  Differentially Private Significance Tests for Regression Coefficients , 2017, Journal of Computational and Graphical Statistics.

[21]  C. Dwork,et al.  Exposed! A Survey of Attacks on Private Data , 2017, Annual Review of Statistics and Its Application.

[22]  Jing Lei,et al.  Differentially private model selection with penalized and constrained likelihood , 2016, 1607.04204.

[23]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[24]  Yue Wang,et al.  Differentially Private Hypothesis Testing, Revisited , 2015, ArXiv.

[25]  Loïc Grenié,et al.  Inequalities for the beta function , 2015 .

[26]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[27]  Merlise A. Clyde,et al.  Mixtures of g-Priors in Generalized Linear Models , 2015, Journal of the American Statistical Association.

[28]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[29]  Eftychia Solea,et al.  Differentially Private Hypothesis Testing For Normal Random Variables. , 2014 .

[30]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[31]  Christian P. Robert,et al.  On the Jeffreys-Lindley Paradox , 2014, Philosophy of Science.

[32]  Michael B. Miller Linear Regression Analysis , 2013 .

[33]  Leonhard Held,et al.  Approximate Bayesian Model Selection with the Deviance Statistic , 2013, 1308.6780.

[34]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[35]  Purnamrita Sarkar,et al.  The Big Data Bootstrap , 2012, ICML.

[36]  M. J. Bayarri,et al.  Criteria for Bayesian model choice with application to variable selection , 2012, 1209.5240.

[37]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[38]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[39]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[40]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[41]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[42]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[43]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[44]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[45]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[46]  Valen E. Johnson,et al.  Bayes factors based on test statistics , 2005 .

[47]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[48]  R. Wolpert,et al.  Integrated likelihood methods for eliminating nuisance parameters , 1999 .

[49]  J. Berger,et al.  The Intrinsic Bayes Factor for Model Selection and Prediction , 1996 .

[50]  A. Zellner,et al.  Posterior odds ratios for selected regression hypotheses , 1980 .

[51]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[52]  H. Akaike A new look at the statistical model identification , 1974 .

[53]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[54]  D. Lindley A STATISTICAL PARADOX , 1957 .

[55]  Thorsten Gerber,et al.  Handbook Of Mathematical Functions , 2016 .

[56]  L. Pericchi,et al.  BAYES FACTORS AND MARGINAL DISTRIBUTIONS IN INVARIANT SITUATIONS , 2016 .

[57]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[58]  L. Birge,et al.  An alternative point of view on Lepski's method , 2001 .

[59]  James O. Berger,et al.  Objective Bayesian Methods for Model Selection: Introduction and Comparison , 2001 .

[60]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[61]  L. M. M.-T. Theory of Probability , 1929, Nature.