论文信息 - A Stochastic Second-Order Proximal Method for Distributed Optimization

A Stochastic Second-Order Proximal Method for Distributed Optimization

We propose a distributed stochastic second-order proximal (St-SoPro) method that enables agents in a network to cooperatively minimize the sum of their local loss functions without any centralized coordination. St-SoPro incorporates a decentralized second-order approximation into an augmented Lagrangian function, and randomly samples the local gradients and Hessian matrices to update, so that it is efficient in solving large-scale problems. We show that for restricted strongly convex and smooth problems, the agents linearly converge in expectation to a neighborhood of the optimum, and the neighborhood can be arbitrarily small under proper parameter settings. Simulations over real machine learning datasets demonstrate that St-SoPro outperforms several state-of-the-art methods in terms of convergence speed as well as computation and communication costs.

Jie Lu | Shanying Zhu | Chenyang Qiu | Zichong Ou

[1] Shi Pu,et al. Improving the Transient Times for Distributed Stochastic Gradient Methods , 2021, IEEE Transactions on Automatic Control.

[2] K. Johansson,et al. A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization , 2020, IEEE/CAA Journal of Automatica Sinica.

[3] I. Paschalidis,et al. A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent , 2019, IEEE Transactions on Automatic Control.

[4] Lianghao Ji,et al. A Distributed Stochastic Proximal-Gradient Algorithm for Composite Optimization , 2021, IEEE Transactions on Control of Network Systems.

[5] Tsung-Hui Chang,et al. Distributed Stochastic Consensus Optimization With Momentum for Nonconvex Nonsmooth Problems , 2020, IEEE Transactions on Signal Processing.

[6] Jie Lu,et al. A Second-Order Proximal Algorithm for Consensus Optimization , 2020, IEEE Transactions on Automatic Control.

[7] Angelia Nedic,et al. Distributed stochastic gradient tracking methods , 2018, Mathematical Programming.

[8] Stefanie Jegelka,et al. IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method , 2020, NeurIPS.

[9] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[10] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[11] Wei Shi,et al. Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[12] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[13] Wei Shi,et al. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, SIAM J. Optim..

[14] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[15] Bart De Schutter,et al. Accelerated gradient methods and dual decomposition in distributed model predictive control , 2013, Autom..

[16] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[17] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[18] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[19] Georgios B. Giannakis,et al. Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity , 2010, IEEE Transactions on Signal Processing.

[20] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[21] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[22] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[23] Robert D. Tortora,et al. Sampling: Design and Analysis , 2000 .