A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global $\epsilon$-approximation solution with $\mathcal{O}(\log (1/\epsilon))$ rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA.

[1]  P. Varshney,et al.  Federated Minimax Optimization: Improved Convergence Analyses and Algorithms , 2022, ICML.

[2]  Robert Birke,et al.  Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data , 2021, ArXiv.

[3]  Mehrdad Mahdavi,et al.  Local Stochastic Gradient Descent Ascent: Convergence Analysis and Communication Efficiency , 2021, AISTATS.

[4]  Mehrdad Mahdavi,et al.  Distributionally Robust Federated Averaging , 2021, NeurIPS.

[5]  P. Dvurechensky,et al.  Decentralized Distributed Optimization for Saddle Point Problems , 2021, ArXiv.

[6]  George J. Pappas,et al.  Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients , 2021, NeurIPS.

[7]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[8]  A. Gasnikov,et al.  Distributed Saddle-Point Problems: Lower Bounds, Near-Optimal and Robust Algorithms , 2020, 2010.13112.

[9]  Asuman Ozdaglar,et al.  Train simultaneously, generalize better: Stability of gradient-based minimax learners , 2020, ICML.

[10]  Qinghua Liu,et al.  Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.

[11]  Ali Jadbabaie,et al.  Robust Federated Learning: The Case of Affine Distribution Shifts , 2020, NeurIPS.

[12]  Tao Sun,et al.  FedGAN: Federated Generative Adversarial Networks for Distributed Data , 2020, ArXiv.

[13]  Mengdi Wang,et al.  Generalization Bounds for Stochastic Saddle Point Problems , 2020, AISTATS.

[14]  Martin J. Wainwright,et al.  FedSplit: An algorithmic framework for fast federated optimization , 2020, NeurIPS.

[15]  Mingrui Liu,et al.  Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets , 2019, ICLR.

[16]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[17]  Mingrui Liu,et al.  Decentralized Parallel Algorithm for Training Generative Adversarial Nets , 2019, ArXiv.

[18]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[19]  Tengyu Ma,et al.  Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin , 2019, ArXiv.

[20]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[21]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[22]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[23]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[24]  Po-Ling Loh,et al.  Adversarial Risk Bounds via Function Transformation , 2018 .

[25]  John C. Duchi,et al.  Learning Models with Uniform Performance via Distributionally Robust Optimization , 2018, ArXiv.

[26]  Yishay Mansour,et al.  Improved generalization bounds for robust learning , 2018, ALT.

[27]  Kannan Ramchandran,et al.  Rademacher Complexity for Adversarially Robust Generalization , 2018, ICML.

[28]  Yu Bai,et al.  Approximability of Discriminators Implies Diversity in GANs , 2018, ICLR.

[29]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[30]  Tao Xu,et al.  On the Discrimination-Generalization Tradeoff in GANs , 2017, ICLR.

[31]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[32]  Jaeho Lee,et al.  Minimax Statistical Learning with Wasserstein distances , 2017, NeurIPS.

[33]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[34]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[35]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[36]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[38]  David Tse,et al.  A Minimax Approach to Supervised Learning , 2016, NIPS.

[39]  Mladen Kolar,et al.  Efficient Distributed Learning with Sparsity , 2016, ICML.

[40]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[41]  Jorge Cortés,et al.  Distributed subgradient methods for saddle-point problems , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[42]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[43]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[44]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[45]  Trac D. Tran,et al.  Robust Lasso With Missing and Grossly Corrupted Observations , 2011, IEEE Transactions on Information Theory.

[46]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[47]  Angelia Nedic,et al.  Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[48]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[49]  M. Sion On general minimax theorems , 1958 .

[50]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[51]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[52]  Vaikkunth Mugunthan,et al.  Bias-Free FedGAN: A Federated Approach to Generate Bias-Free Datasets , 2021 .

[53]  Wenhan Xian,et al.  A Faster Decentralized Algorithm for Nonconvex Minimax Problems , 2021, NeurIPS.

[54]  José M. F. Moura,et al.  Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[55]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[56]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[57]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .