Efficient Algorithms for Federated Saddle Point Optimization

We consider strongly convex-concave minimax problems in the federated setting, where the communication constraint is the main bottleneck. When clients are arbitrarily heterogeneous, a simple Minibatch Mirror-prox achieves the best performance. As the clients become more homogeneous, using multiple local gradient updates at the clients significantly improves upon Minibatch Mirror-prox by communicating less frequently. Our goal is to design an algorithm that can harness the benefit of similarity in the clients while recovering the Minibatch Mirror-prox performance under arbitrary heterogeneity (up to log factors). We give the first federated minimax optimization algorithm that achieves this goal. The main idea is to combine (i) SCAFFOLD (an algorithm that performs variance reduction across clients for convex optimization) to erase the worst-case dependency on heterogeneity and (ii) Catalyst (a framework for acceleration based on modifying the objective) to accelerate convergence without amplifying client drift. We prove that this algorithm achieves our goal, and include experiments to validate the theory.

[1]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[2]  Martin Jaggi,et al.  Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning. , 2020, 2008.03606.

[3]  Nathan Srebro,et al.  Minibatch vs Local SGD for Heterogeneous Distributed Learning , 2020, NeurIPS.

[4]  Konstantin Mishchenko,et al.  Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.

[5]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[6]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[7]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[8]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[9]  Eduard A. Gorbunov,et al.  Local SGD: Unified Theory and New Efficient Methods , 2020, AISTATS.

[10]  Ohad Shamir,et al.  Is Local SGD Better than Minibatch SGD? , 2020, ICML.

[11]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[12]  RockaJellm MONOTONE OPERATORS ASSOCIATED WITH SADDLE . FUNCTIONS AND MINIMAX PROBLEMS R . 1 ' , 2022 .

[13]  Tengyu Ma,et al.  Federated Accelerated Stochastic Gradient Descent , 2020, NeurIPS.

[14]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[15]  H. Brendan McMahan,et al.  Generative Models for Effective ML on Private, Decentralized Datasets , 2019, ICLR.

[16]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[17]  Kevin Tian,et al.  Variance Reduction for Matrix Games , 2019, NeurIPS.

[18]  Enhong Chen,et al.  Variance Reduced Local SGD with Lower Communication Complexity , 2019, ArXiv.

[19]  P. Tseng On linear convergence of iterative methods for the variational inequality problem , 1995 .

[20]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[21]  Shuzhong Zhang,et al.  On lower iteration complexity bounds for the convex concave saddle point problems , 2019, Math. Program..

[22]  Aleksandr Beznosikov,et al.  Local SGD for Saddle-Point Problems , 2020, ArXiv.

[23]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[24]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..