论文信息 - Delayed Stochastic Algorithms for Distributed Weakly Convex Optimization

Delayed Stochastic Algorithms for Distributed Weakly Convex Optimization

This paper studies delayed stochastic algorithms for weakly convex optimization in a distributed network with workers connected to a master node. More specifically, we consider a structured stochastic weakly convex objective function which is the composition of a convex function and a smooth nonconvex function. Recently, Xu et al. 2022 showed that an inertial stochastic subgradient method converges at a rate of $\mathcal{O}(\tau/\sqrt{K})$, which suffers a significant penalty from the maximum information delay $\tau$. To alleviate this issue, we propose a new delayed stochastic prox-linear ($\texttt{DSPL}$) method in which the master performs the proximal update of the parameters and the workers only need to linearly approximate the inner smooth function. Somewhat surprisingly, we show that the delays only affect the high order term in the complexity rate and hence, are negligible after a certain number of $\texttt{DSPL}$ iterations. Moreover, to further improve the empirical performance, we propose a delayed extrapolated prox-linear ($\texttt{DSEPL}$) method which employs Polyak-type momentum to speed up the algorithm convergence. Building on the tools for analyzing $\texttt{DSPL}$, we also develop improved analysis of delayed stochastic subgradient method ($\texttt{DSGD}$). In particular, for general weakly convex problems, we show that convergence of $\texttt{DSGD}$ only depends on the expected delay.

Qinhao Deng | W. Gao

[1] Sebastian U. Stich,et al. Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning , 2022, NeurIPS.

[2] Blake E. Woodworth,et al. Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays , 2022, NeurIPS.

[3] Jie Chen,et al. Distributed Stochastic Inertial-Accelerated Methods with Delayed Derivatives for Nonconvex Problems , 2021, SIAM J. Imaging Sci..

[4] Amit Daniely,et al. Asynchronous Stochastic Optimization Robust to Arbitrary Delays , 2021, NeurIPS.

[5] Qi Deng,et al. Minibatch and Momentum Model-based Methods for Stochastic Weakly Convex Optimization , 2021, NeurIPS.

[6] Damek Davis,et al. Low-Rank Matrix Recovery with Composite Optimization: Good Conditioning and Rapid Convergence , 2021, Foundations of Computational Mathematics.

[7] Shahin Shahrampour,et al. On Distributed Nonconvex Optimization: Projected Subgradient Method for Weakly Convex Problems in Networks , 2020, IEEE Transactions on Automatic Control.

[8] Lin Xiao,et al. Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization , 2020, Mathematical Programming.

[9] Mikael Johansson,et al. Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimization , 2020, ICML.

[10] M. Papatriantafilou,et al. MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[11] Anthony Man-Cho So,et al. Incremental Methods for Weakly Convex Optimization , 2019, ArXiv.