Privacy-Preserving Federated Learning via Normalized (instead of Clipped) Updates

Differentially private federated learning (FL) entails bounding the sensitivity to each client’s update. The customary approach used in practice for bounding sensitivity is to clip the client updates, which is just projection onto an `2 ball of some radius (called the clipping threshold) centered at the origin. However, clipping introduces bias depending on the clipping threshold and its impact on convergence has not been properly analyzed in the FL literature. In this work, we propose a simpler alternative for bounding sensitivity which is normalization, i.e. use only the unit vector along the client updates, completely discarding the magnitude information. We call this algorithm DP-NormFedAvg and show that it has the same order-wise convergence rate as FedAvg on smooth quasar-convex functions (an important class of non-convex functions for modeling optimization of deep neural networks) modulo the noise variance term (due to privacy). Further, assuming that the per-sample client losses obey a strong-growth type of condition, we show that with high probability, the sensitivity reduces by a factor of O( 1 m ), where m is the minimum number of samples within a client, compared to its worst-case value. Using this high probability sensitivity value enables us to reduce the iteration complexity of DP-NormFedAvg by a factor of O( 1 m2 ), at the expense of an exponentially small degradation in the privacy guarantee. We also corroborate our theory with experiments on neural networks.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Virendra J. Marathe,et al.  Private Federated Learning with Domain Adaptation , 2019, ArXiv.

[3]  Jeffrey F. Naughton,et al.  Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics , 2016, SIGMOD Conference.

[4]  Suvrit Sra,et al.  Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity , 2019, ICLR.

[5]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[6]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[7]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[8]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[9]  Jinfeng Yi,et al.  Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy , 2021, ICML.

[10]  Robert Mansel Gower,et al.  SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation , 2020, AISTATS.

[11]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[12]  Vladimir Dvorkin,et al.  Differentially Private Convex Optimization with Feasibility Guarantees , 2020, ArXiv.

[13]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[14]  Vyas Sekar,et al.  Privacy for Free: Communication-Efficient Learning with Differential Privacy Using Sketches , 2019, ArXiv.

[15]  Sashank J. Reddi,et al.  AdaCliP: Adaptive Clipping for Private SGD , 2019, ArXiv.

[16]  Gilles Barthe,et al.  Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences , 2018, NeurIPS.

[17]  Yi Zhou,et al.  SGD Converges to Global Minimum in Deep Learning via Star-convex Path , 2019, ICLR.

[18]  Ayfer Özgür,et al.  Differentially Private Federated Learning: An Information-Theoretic Perspective , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[19]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[20]  Ahmad-Reza Sadeghi,et al.  FLGUARD: Secure and Private Federated Learning , 2021, IACR Cryptol. ePrint Arch..

[21]  Shai Shalev-Shwartz,et al.  Beyond Convexity: Stochastic Quasi-Convex Optimization , 2015, NIPS.

[22]  James Demmel,et al.  Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.

[23]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[24]  Andreas Veit,et al.  Why are Adaptive Methods Good for Attention Models? , 2020, NeurIPS.

[25]  Suhas N. Diggavi,et al.  Shuffled Model of Differential Privacy in Federated Learning , 2021, AISTATS.

[26]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[27]  Mark W. Schmidt,et al.  Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.

[28]  Mark W. Schmidt,et al.  Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition , 2013, 1308.6370.

[29]  Aryan Mokhtari,et al.  Federated Learning with Compression: Unified Analysis and Sharp Guarantees , 2020, AISTATS.

[30]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[31]  Aaron Sidford,et al.  Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond , 2019, COLT.

[32]  Sinan Yildirim,et al.  Differentially Private Accelerated Optimization Algorithms , 2020, SIAM J. Optim..

[33]  Yang You,et al.  Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[34]  Zhiwei Steven Wu,et al.  Understanding Gradient Clipping in Private SGD: A Geometric Perspective , 2020, NeurIPS.

[35]  Yunwen Lei,et al.  Differentially Private SGD with Non-Smooth Loss , 2021 .

[36]  Yuanzhi Li,et al.  An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.

[37]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[38]  Ashok Cutkosky,et al.  Momentum Improves Normalized SGD , 2020, ICML.

[39]  Dawn Song,et al.  Towards Practical Differentially Private Convex Optimization , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[40]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[41]  Michael I. Jordan,et al.  A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  H. Brendan McMahan,et al.  Differentially Private Learning with Adaptive Clipping , 2019, NeurIPS.

[44]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.