Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain

We revisit the problem of linear regression under a differential privacy constraint. By consolidating existing pieces in the literature, we clarify the correct dependence of the feature, label and coefficient domains in the optimization error and estimation error, hence revealing the delicate price of differential privacy in statistical estimation and statistical learning. Moreover, we propose simple modifications of two existing DP algorithms: (a) posterior sampling, (b) sufficient statistics perturbation, and show that they can be upgraded into **adaptive** algorithms that are able to exploit data-dependent quantities and behave nearly optimally **for every instance**. Extensive experiments are conducted on both simulated data and real data, which conclude that both AdaOPS and AdaSSP outperform the existing techniques on nearly all 36 data sets that we test on.

[1]  Rebecca N. Wright,et al.  Differential privacy: an exploration of the privacy-utility landscape , 2013 .

[2]  P. Massart,et al.  Gaussian model selection , 2001 .

[3]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[4]  Li Zhang,et al.  Nearly Optimal Private LASSO , 2015, NIPS.

[5]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[6]  Yu-Xiang Wang Per-instance Differential Privacy and the Adaptivity of Posterior Sampling in Linear and Ridge regression , 2017, ArXiv.

[7]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[8]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[9]  A. Agresti,et al.  Statistical Methods for the Social Sciences , 1979 .

[10]  Hiroshi Nakagawa,et al.  Differential Privacy without Sensitivity , 2016, NIPS.

[11]  James R. Foulds,et al.  On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis , 2016, UAI.

[12]  Christos Dimitrakakis,et al.  Robust and Private Bayesian Inference , 2013, ALT.

[13]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[14]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[15]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[16]  Ohad Shamir,et al.  The sample complexity of learning linear predictors with the squared loss , 2014, J. Mach. Learn. Res..

[17]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[20]  Li Zhang,et al.  Analyze gauss: optimal bounds for privacy-preserving principal component analysis , 2014, STOC.

[21]  Wenyaw Chan,et al.  Statistical Methods in Medical Research , 2013, Model. Assist. Stat. Appl..

[22]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[23]  F. Galton Regression Towards Mediocrity in Hereditary Stature. , 1886 .

[24]  Kfir Y. Levy,et al.  Fast Rates for Exp-concave Empirical Risk Minimization , 2015, NIPS.

[25]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[26]  A. Ihler,et al.  On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis , 2016 .

[27]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[28]  Li Zhang,et al.  Private Empirical Risk Minimization Beyond the Worst Case: The Effect of the Constraint Set Geometry , 2014, ArXiv.

[29]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[30]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[31]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[32]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[33]  Jing Lei,et al.  Differentially private model selection with penalized and constrained likelihood , 2016, 1607.04204.

[34]  Alexander J. Smola,et al.  Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.

[35]  W. Greene,et al.  计量经济分析 = Econometric analysis , 2009 .