Learning rate adaptation for federated and differentially private learning

We propose an algorithm for the adaptation of the learning rate for stochastic gradient descent (SGD) that avoids the need for validation set use. The idea for the adaptiveness comes from the technique of extrapolation: to get an estimate for the error against the gradient flow which underlies SGD, we compare the result obtained by one full step and two half-steps. The algorithm is applied in two separate frameworks: federated and differentially private learning. Using examples of deep neural networks we empirically show that the adaptive algorithm is competitive with manually tuned commonly used optimisation methods for differentially privately training. We also show that it works robustly in the case of federated learning unlike commonly used optimisation methods.

[1]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[2]  James R. Foulds,et al.  Variational Bayes In Private Settings (VIPS) , 2016, J. Artif. Intell. Res..

[3]  Christos Dimitrakakis,et al.  On the Differential Privacy of Bayesian Inference , 2015, AAAI.

[4]  Ernst Hairer,et al.  Solving Ordinary Differential Equations I: Nonstiff Problems , 2009 .

[5]  Arun Rajkumar,et al.  A Differentially Private Stochastic Gradient Descent Algorithm for Multiparty Classification , 2012, AISTATS.

[6]  Kunal Talwar,et al.  Private selection from private candidates , 2018, STOC.

[7]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[8]  Roman Garnett,et al.  Differentially Private Bayesian Optimization , 2015, ICML.

[9]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[10]  Josef Stoer,et al.  Extrapolation methods for the solution of initial value problems and their practical realization , 1974 .

[11]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[12]  Nathan Srebro,et al.  The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  K. Kersting,et al.  Differentially Private Variational Inference for Non-conjugate Models , 2016, UAI.

[15]  Javier Romero,et al.  Coupling Adaptive Batch Sizes with Learning Rates , 2016, UAI.

[16]  Christos Dimitrakakis,et al.  Robust and Private Bayesian Inference , 2013, ALT.

[17]  Lawrence Carin,et al.  On Connecting Stochastic Gradient MCMC and Differential Privacy , 2017, AISTATS.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[20]  Li Zhang,et al.  Analyze gauss: optimal bounds for privacy-preserving principal component analysis , 2014, STOC.

[21]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[22]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[23]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[24]  Alexander J. Smola,et al.  Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.