Robust Aggregation for Federated Learning

We present a novel approach to federated learning that endows its aggregation process with greater robustness to potential poisoning of local data or model parameters of participating devices. The proposed approach, Robust Federated Aggregation (RFA), relies on the aggregation of updates using the geometric median, which can be computed efficiently using a Weiszfeld-type algorithm. RFA is agnostic to the level of corruption and aggregates model updates without revealing each device’s individual contribution. We establish the convergence of the robust federated learning algorithm for the stochastic learning of additive models with least squares. We also offer two variants of RFA: a faster one with one-step robust aggregation, and another one with on-device personalization. We present experimental results with additive models and deep networks for three tasks in computer vision and natural language processing. The experiments show that RFA is competitive with the classical aggregation when the level of corruption is low, while demonstrating greater robustness under high corruption.

[1]  William Shakespeare,et al.  Complete Works of William Shakespeare , 1854 .

[2]  Harold W. Kuhn,et al.  A note on Fermat's problem , 1973, Math. Program..

[3]  I. Norman Katz,et al.  Local convergence in Fermat's problem , 1974, Math. Program..

[4]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[5]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[6]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[7]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[8]  M. Shirosaki Another proof of the defect relation for moving targets , 1991 .

[9]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Cun-Hui Zhang,et al.  A modified Weiszfeld algorithm for the Fermat-Weber location problem , 2001, Math. Program..

[13]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[14]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[15]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[16]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[17]  Craig Gentry,et al.  Computing arbitrary functions of encrypted data , 2010, CACM.

[18]  Nikolaos G. Bourbakis,et al.  A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[20]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[21]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[22]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[23]  Ali Sayed,et al.  Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Amir Beck,et al.  Weiszfeld’s Method: Old and New Results , 2015, J. Optim. Theory Appl..

[26]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[27]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[28]  Amir Beck,et al.  On the Convergence of Alternating Minimization for Convex Programming with Applications to Iteratively Reweighted Least Squares and Decomposition Schemes , 2015, SIAM J. Optim..

[29]  Jakub W. Pachocki,et al.  Geometric median in nearly linear time , 2016, STOC.

[30]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[31]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[32]  G. Lugosi,et al.  Risk minimization by median-of-means tournaments , 2016, Journal of the European Mathematical Society.

[33]  Daniel J. Hsu,et al.  Loss Minimization and Parameter Estimation with Heavy Tails , 2013, J. Mach. Learn. Res..

[34]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[35]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[36]  Matthieu Lerasle,et al.  Robust machine learning by median-of-means: Theory and practice , 2017, The Annals of Statistics.

[37]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[38]  Prateek Jain,et al.  Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..

[39]  Ali H. Sayed,et al.  Robust Distributed Estimation by Networked Agents , 2017, IEEE Transactions on Signal Processing.

[40]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[41]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[42]  Srinivas Devadas,et al.  A Formal Foundation for Secure Remote Execution of Enclaves , 2017, IACR Cryptol. ePrint Arch..

[43]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[44]  Michael I. Jordan,et al.  Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..

[45]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[46]  Prateek Jain,et al.  A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) , 2017, FSTTCS.

[47]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[48]  Dimitris S. Papailiopoulos,et al.  DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.

[49]  Anit Kumar Sahu,et al.  On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.

[50]  Martin Jaggi,et al.  COLA: Decentralized Linear Learning , 2018, NeurIPS.

[51]  Fabian Pedregosa,et al.  Improved asynchronous parallel optimization analysis for stochastic incremental methods , 2018, J. Mach. Learn. Res..

[52]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[53]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[54]  Vladimir Kolesnikov,et al.  A Pragmatic Introduction to Secure Multi-Party Computation , 2019, Found. Trends Priv. Secur..

[55]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[56]  Stanislav Minsker Uniform bounds for robust mean estimators , 2018, 1812.03523.

[57]  Chao Gao,et al.  Robust covariance and scatter matrix estimation under Huber’s contamination model , 2015, The Annals of Statistics.

[58]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[59]  Haiquan Zhao,et al.  Robust Distributed Diffusion Recursive Least Squares Algorithms With Side Information for Adaptive Networks , 2018, IEEE Transactions on Signal Processing.

[60]  Shuai Zheng,et al.  Federated Learning-Based Computation Offloading Optimization in Edge Computing-Supported Internet of Things , 2019, IEEE Access.

[61]  G. Lugosi,et al.  Regularization, sparse recovery, and median-of-means tournaments , 2017, Bernoulli.

[62]  Li Huang,et al.  Patient Clustering Improves Efficiency of Federated Machine Learning to predict mortality and hospital stay time using distributed Electronic Medical Records , 2019, J. Biomed. Informatics.

[63]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, MLSys.

[64]  Yu Cheng,et al.  High-Dimensional Robust Mean Estimation in Nearly-Linear Time , 2018, SODA.

[65]  Kilian Q. Weinberger,et al.  Optimal Convergence Rates for Convex Distributed Optimization in Networks , 2019, J. Mach. Learn. Res..

[66]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[67]  Qing Ling,et al.  RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets , 2018, AAAI.

[68]  Soummya Kar,et al.  Resilient Distributed Parameter Estimation With Heterogeneous Data , 2018, IEEE Transactions on Signal Processing.

[69]  Lifeng Lai,et al.  Distributed Gradient Descent Algorithm Robust to an Arbitrary Number of Byzantine Attackers , 2019, IEEE Transactions on Signal Processing.

[70]  Kuan Eeik Tan,et al.  Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System , 2019, ArXiv.

[71]  Ananda Theertha Suresh,et al.  Can You Really Backdoor Federated Learning? , 2019, ArXiv.

[72]  Qing Ling,et al.  Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks , 2019, IEEE Transactions on Signal Processing.

[73]  Gilles Barthe,et al.  Hypothesis Testing Interpretations and Renyi Differential Privacy , 2019, AISTATS.

[74]  Tancrède Lepoint,et al.  Secure Single-Server Aggregation with (Poly)Logarithmic Overhead , 2020, IACR Cryptol. ePrint Arch..

[75]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[76]  Chen-Yu Wei,et al.  Federated Residual Learning , 2020, ArXiv.

[77]  Nguyen H. Tran,et al.  Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[78]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[79]  Yoav Zemel,et al.  An Invitation to Statistics in Wasserstein Space , 2020 .

[80]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[81]  Aryan Mokhtari,et al.  Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach , 2020, NeurIPS.

[82]  Junshan Zhang,et al.  A Collaborative Learning Framework via Federated Meta-Learning , 2020, 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS).

[83]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[84]  Shuai Yi,et al.  Collaborative Unsupervised Visual Representation Learning from Decentralized Data , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[85]  Yonina C. Eldar,et al.  Federated Learning: A signal processing perspective , 2021, IEEE Signal Processing Magazine.

[86]  Suhas Diggavi,et al.  A Field Guide to Federated Optimization , 2021, ArXiv.

[87]  Manzil Zaheer,et al.  Adaptive Federated Optimization , 2020, ICLR.

[88]  Thomas Steinke,et al.  The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation , 2021, ICML.

[89]  Shiva Prasad Kasiviswanathan,et al.  Federated Learning under Arbitrary Communication Patterns , 2021, ICML.

[90]  Z. Harchaoui,et al.  A Superquantile Approach to Federated Learning with Heterogeneous Devices , 2021, 2021 55th Annual Conference on Information Sciences and Systems (CISS).

[91]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .