An Operator Splitting View of Federated Learning

Over the past few years, the federated learning (FL) community has witnessed a proliferation of new FL algorithms. However, our understating of the theory of FL is still fragmented, and a thorough, formal comparison of these algorithms remains elusive. Motivated by this gap, we show that many of the existing FL algorithms can be understood from an operator splitting point of view. This unification allows us to compare different algorithms with ease, to refine previous convergence results and to uncover new algorithmic variants. In particular, our analysis reveals the vital role played by the step size in FL algorithms. The unification also leads to a streamlined and economic way to accelerate FL algorithms, without incurring any communication overhead. We perform numerical experiments on both convex and nonconvex models to validate our findings.

[1]  Jie Ding,et al.  HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients , 2020, ICLR.

[2]  Eduard A. Gorbunov,et al.  Local SGD: Unified Theory and New Efficient Methods , 2020, AISTATS.

[3]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[4]  Yaoliang Yu,et al.  FedMGDA+: Federated Learning meets Multi-objective Optimization , 2020, ArXiv.

[5]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[6]  Ali Jadbabaie,et al.  Robust Federated Learning: The Case of Affine Distribution Shifts , 2020, NeurIPS.

[7]  Ramesh Raskar,et al.  FedML: A Research Library and Benchmark for Federated Machine Learning , 2020, ArXiv.

[8]  Paul Tseng,et al.  A Modified Forward-backward Splitting Method for Maximal Monotone Mappings 1 , 1998 .

[9]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[10]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[11]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[12]  D. Gabay Applications of the method of multipliers to variational inequalities , 1983 .

[13]  Yaoliang Yu,et al.  Minimizing Nonconvex Non-Separable Functions , 2015, AISTATS.

[14]  Martin J. Wainwright,et al.  FedSplit: An algorithmic framework for fast federated optimization , 2020, NeurIPS.

[15]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[16]  Heinz H. Bauschke,et al.  The asymptotic behavior of the composition of two resolvents , 2005, Nonlinear Analysis: Theory, Methods & Applications.

[17]  Ronald E. Bruck On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in Hilbert space , 1977 .

[18]  Manzil Zaheer,et al.  Adaptive Federated Optimization , 2020, ICLR.

[19]  Y. Mansour,et al.  Three Approaches for Personalization with Applications to Federated Learning , 2020, ArXiv.

[20]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[21]  Xun Qian,et al.  Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization , 2020, ICML.

[22]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[23]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[24]  Peter Richtárik,et al.  First Analysis of Local GD on Heterogeneous Data , 2019, ArXiv.

[25]  Yasaman Khazaeni,et al.  Federated Learning with Matched Averaging , 2020, ICLR.

[26]  F. Browder,et al.  Nonlinear ergodic theorems , 1976 .

[27]  Stephen P. Boyd,et al.  Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations , 2018, SIAM J. Optim..

[28]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[29]  Jakub Konecný,et al.  Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning , 2021, AISTATS.

[30]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[31]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[32]  Laurent Condat,et al.  From Local SGD to Local Fixed Point Methods for Federated Learning , 2020, ICML.

[33]  P. Lions,et al.  Une methode iterative de resolution d’une inequation variationnelle , 1978 .

[34]  Stephen P. Boyd,et al.  Anderson Accelerated Douglas-Rachford Splitting , 2019, SIAM J. Sci. Comput..

[35]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[36]  Nguyen H. Tran,et al.  Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[37]  Gregory B. Passty Ergodic convergence to a zero of the sum of monotone operators in Hilbert space , 1979 .

[38]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[39]  Paul Tseng,et al.  On the Convergence of the Products of Firmly Nonexpansive Mappings , 1992, SIAM J. Optim..

[40]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[41]  H. H. Rachford,et al.  The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .

[42]  Yaoliang Yu,et al.  Better Approximation and Faster Algorithm Using the Proximal Average , 2013, NIPS.

[43]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[44]  Jonathan E. Spingarn,et al.  Applications of the method of partial inverses to convex programming: Decomposition , 1985, Math. Program..

[45]  Sanja Fidler,et al.  Personalized Federated Learning with First Order Model Optimization , 2020, ICLR.

[46]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[47]  J. Spingarn Partial inverse of a monotone operator , 1983 .

[48]  Heinz H. Bauschke,et al.  Reflection-Projection Method for Convex Feasibility Problems with an Obtuse Cone , 2004 .

[49]  Konstantin Mishchenko,et al.  Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.

[50]  Panagiotis Patrinos,et al.  Douglas-Rachford Splitting and ADMM for Nonconvex Optimization: Tight Convergence Results , 2017, SIAM J. Optim..

[51]  R. Rockafellar Progressive Decoupling of Linkages in Optimization and Variational Inequalities with Elicitable Convexity or Monotonicity , 2018, Set-Valued and Variational Analysis.

[52]  Lawrence Carin,et al.  Faster On-Device Training Using New Federated Momentum Algorithm , 2020, ArXiv.

[53]  Ananda Theertha Suresh,et al.  Can You Really Backdoor Federated Learning? , 2019, ArXiv.

[54]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[55]  Donald G. M. Anderson Iterative Procedures for Nonlinear Integral Equations , 1965, JACM.

[56]  H. Brendan McMahan,et al.  Generative Models for Effective ML on Private, Decentralized Datasets , 2019, ICLR.

[57]  Mehrdad Mahdavi,et al.  Adaptive Personalized Federated Learning , 2020, ArXiv.