Differentially Private Coordinate Descent for Composite Empirical Risk Minimization

Machine learning models can leak information about the data used to train them. Differentially Private (DP) variants of optimization algorithms like Stochastic Gradient Descent (DP-SGD) have been designed to mitigate this, inducing a trade-off between privacy and utility. In this paper, we propose a new method for composite Differentially Private Empirical Risk Minimization (DP-ERM): Differentially Private proximal Coordinate Descent (DPCD). We analyze its utility through a novel theoretical analysis of inexact coordinate descent, and highlight some regimes where DPCD outperforms DP-SGD, thanks to the possibility of using larger step sizes. We also prove new lower bounds for composite DP-ERM under coordinate-wise regularity assumptions, that are, in some settings, nearly matched by our algorithm. In practical implementations, the coordinate-wise nature of DP-CD updates demands special care in choosing the clipping thresholds used to bound individual contributions to the gradients. A natural parameterization of these thresholds emerges from our theory, limiting the addition of unnecessarily large noise without requiring coordinatewise hyperparameter tuning or extra computational cost.

[1]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  Raef Bassily,et al.  Non-Euclidean Differentially Private Stochastic Convex Optimization , 2021, COLT.

[3]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[4]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[5]  Brian Chmiel,et al.  Neural gradients are near-lognormal: improved quantized and sparse training , 2020, ICLR.

[6]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[7]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[8]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[9]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[10]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[11]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[12]  Dmitry Kovalev,et al.  Variance Reduced Coordinate Descent with Acceleration: New Method With a Surprising Application to Finite-Sum Problems , 2020, ICML.

[13]  H. Brendan McMahan,et al.  Differentially Private Learning with Adaptive Clipping , 2019, NeurIPS.

[14]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[15]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[16]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[17]  Mark W. Schmidt,et al.  Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence , 2017 .

[18]  Stefano Lucidi,et al.  A Fast Active Set Block Coordinate Descent Algorithm for ℓ1-Regularized Least Squares , 2014, SIAM J. Optim..

[19]  Rachid Guerraoui,et al.  Differentially Private Stochastic Coordinate Descent , 2020, ArXiv.

[20]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[21]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[22]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[23]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[24]  Peter Richtárik,et al.  SEGA: Variance Reduction via Gradient Sketching , 2018, NeurIPS.

[25]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[26]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[27]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[28]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[29]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[30]  Sashank J. Reddi,et al.  AdaCliP: Adaptive Clipping for Private SGD , 2019, ArXiv.

[31]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[32]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[33]  P. Tseng,et al.  Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization , 2009 .

[34]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[35]  Martin Jaggi,et al.  Efficient Greedy Coordinate Descent for Composite Problems , 2019, AISTATS.

[36]  Stephen J. Wright,et al.  A proximal method for composite minimization , 2008, Mathematical Programming.

[37]  Li Zhang,et al.  Nearly Optimal Private LASSO , 2015, NIPS.

[38]  Alexandre Gramfort,et al.  Celer: a Fast Solver for the Lasso with Dual Extrapolation , 2018, ICML.

[39]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[40]  P. Tseng,et al.  Block Coordinate Relaxation Methods for Nonparametric Wavelet Denoising , 2000 .

[41]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[42]  Peter Richtárik,et al.  Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[43]  Zhiwei Steven Wu,et al.  Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification , 2020, ArXiv.

[44]  Rachid Guerraoui,et al.  Personalized and Private Peer-to-Peer Machine Learning , 2017, AISTATS.

[45]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..