Private Convex Empirical Risk Minimization and High-dimensional Regression

We consider differentially private algorithms for convex empirical risk minimization (ERM). Differential privacy (Dwork et al., 2006b) is a recently introduced notion of privacy which guarantees that an algorithm’s output does not depend on the data of any individual in the dataset. This is crucial in fields that handle sensitive data, such as genomics, collaborative filtering, and economics. Our motivation is the design of private algorithms for sparse learning problems, in which one aims to find solutions (e.g., regression parameters) with few non-zero coefficients. To this end: (a) We significantly extend the analysis of the “objective perturbation” algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and to apply to problems with hard constraints and non-differentiable regularizers. We also give a tighter, data-dependent analysis of the additional error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a weak sense, then the limit algorithm is also differentially private. In particular, our methods give the best known algorithms for differentially private linear regression. These methods work in settings where the number of parametersp is less than the number of samplesn. (b) We give the first two private algorithms for sparse regression problems in high-dimensional settings, where p is much larger than n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have vanishing empirical risk for n = poly(s; logp) when there exists a good regression vector with s nonzero coefficients. Our algorithms demonstrate that randomized algorithms for sparse regression problems can be both stable and accurate ‐ a combination which is impossible for deterministic algorithms.

[1]  A. Zemanian Distribution Theory and Transform Analysis; An Introduction to Generalized Functions, With Applications , 1965 .

[2]  L. E. Clarke,et al.  Probability and Measure , 1980 .

[3]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[4]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[5]  B. Rao,et al.  ℓâ‚€-norm Minimization for Basis Selection , 2004, NIPS 2004.

[6]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[7]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[10]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[11]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[12]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[13]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[14]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[15]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[16]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[17]  Moni Naor,et al.  On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy , 2010, J. Priv. Confidentiality.

[18]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[19]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[20]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..