A Faster Decentralized Algorithm for Nonconvex Minimax Problems

In this paper, we study the nonconvex-strongly-concave minimax optimization problem on decentralized setting. The minimax problems are attracting increasing attentions because of their popular practical applications such as policy evaluation and adversarial training. As training data become larger, distributed training has been broadly adopted in machine learning tasks. Recent research works show that the decentralized distributed data-parallel training techniques are specially promising, because they can achieve the efficient communications and avoid the bottleneck problem on the central node or the latency of low bandwidth network. However, the decentralized minimax problems were seldom studied in literature and the existing methods suffer from very high gradient complexity. To address this challenge, we propose a new faster decentralized algorithm, named as DMHSGD, for nonconvex minimax problems by using the variance reduced technique of hybrid stochastic gradient descent. We prove that our DM-HSGD algorithm achieves stochastic first-order oracle (SFO) complexity of O(κ −3) for decentralized stochastic nonconvex-strongly-concave problem to search an -stationary point, which improves the exiting best theoretical results. Moreover, we also prove that our algorithm achieves linear speedup with respect to the number of workers. Our experiments on decentralized settings show the superior performance of our new algorithm.

[1]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[2]  Jie Wang,et al.  D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems , 2019, AAAI.

[3]  Jan Peters,et al.  Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[4]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[5]  Mingyi Hong,et al.  Decentralized Min-Max Optimization: Formulations, Algorithms and Applications in Network Poisoning Attack , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Alexander Gasnikov,et al.  Decentralized Local Stochastic Extra-Gradient for Variational Inequalities , 2021, ArXiv.

[7]  Mingrui Liu,et al.  Decentralized Parallel Algorithm for Training Generative Adversarial Nets , 2019, ArXiv.

[8]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[9]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[10]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[11]  J. Pei,et al.  Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization , 2020, J. Mach. Learn. Res..

[12]  Zhaoran Wang,et al.  Variance Reduced Policy Evaluation with Smooth Function Approximation , 2019, NeurIPS.

[13]  Rong Jin,et al.  On Stochastic Moving-Average Estimators for Non-Convex Optimization , 2021, ArXiv.

[14]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[15]  Jorge Cortés,et al.  Distributed subgradient methods for saddle-point problems , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[16]  Soummya Kar,et al.  A hybrid variance-reduced method for decentralized stochastic non-convex optimization , 2021, ICML.

[17]  Prateek Jain,et al.  Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[18]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[19]  Marek Petrik,et al.  Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.

[20]  Feng Yan,et al.  Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Zhuoran Yang,et al.  Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization , 2018, NeurIPS.

[22]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[23]  Mingrui Liu,et al.  Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[24]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[25]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[28]  Quoc Tran-Dinh,et al.  Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function , 2020, NeurIPS.

[29]  Yongxin Chen,et al.  Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.

[30]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[31]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[32]  Feihu Huang,et al.  Communication-Efficient Frank-Wolfe Algorithm for Nonconvex Decentralized Distributed Learning , 2021, AAAI.

[33]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Tianbao Yang,et al.  Stochastic Primal-Dual Algorithms with Faster Convergence than O(1/√T) for Problems without Bilinear Structure , 2019, ArXiv.

[36]  Feihu Huang,et al.  AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization , 2021, ArXiv.

[37]  Alejandro Ribeiro,et al.  A Saddle Point Algorithm for Networked Online Convex Optimization , 2014, IEEE Transactions on Signal Processing.

[38]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[39]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[40]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[41]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[42]  Lam M. Nguyen,et al.  Inexact SARAH algorithm for stochastic optimization , 2018, Optim. Methods Softw..

[43]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[44]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[45]  Lihua Xie,et al.  Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[46]  Heng Huang,et al.  Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems , 2021, NeurIPS.

[47]  Tamer Başar,et al.  Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning , 2021, AAAI.

[48]  Aryan Mokhtari,et al.  A Decentralized Proximal Point-type Method for Saddle Point Problems , 2019, ArXiv.

[49]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[50]  Tong Zhang,et al.  Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems , 2020, NeurIPS.