论文信息 - A Distributed Cubic-Regularized Newton Method for Smooth Convex Optimization over Networks

A Distributed Cubic-Regularized Newton Method for Smooth Convex Optimization over Networks

We propose a distributed, cubic-regularized Newton method for large-scale convex optimization over networks. The proposed method requires only local computations and communications and is suitable for federated learning applications over arbitrary network topologies. We show a $O(k^{{-}3})$ convergence rate when the cost function is convex with Lipschitz gradient and Hessian, with $k$ being the number of iterations. We further provide network-dependent bounds for the communication required in each step of the algorithm. We provide numerical experiments that validate our theoretical results.

Ali Jadbabaie | C'esar A. Uribe | César A. Uribe | A. Jadbabaie

[1] Aryan Mokhtari,et al. A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[2] Laurent Massoulié,et al. Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[3] Haitham Bou-Ammar,et al. Distributed Newton Method for Large-Scale Consensus Optimization , 2016, IEEE Transactions on Automatic Control.

[4] Nicholas I. M. Gould,et al. Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization , 2012, Optim. Methods Softw..

[5] Jean-Louis Goffin,et al. On convergence rates of subgradient optimization methods , 1977, Math. Program..

[6] Alejandro Ribeiro,et al. Accelerated Dual Descent for Network Flow Optimization , 2014, IEEE Transactions on Automatic Control.

[7] Franziska Wulf,et al. Minimization Methods For Non Differentiable Functions , 2016 .

[8] Aryan Mokhtari,et al. Decentralized Quasi-Newton Methods , 2016, IEEE Transactions on Signal Processing.

[9] Yurii Nesterov,et al. On inexact solution of auxiliary problems in tensor methods for convex optimization , 2021, Optim. Methods Softw..

[10] M. Baes. Estimate sequence methods: extensions and approximations , 2009 .

[11] Marc Teboulle,et al. A fast dual proximal gradient algorithm for convex minimization and applications , 2014, Oper. Res. Lett..

[12] Maryam Yashtini,et al. On the global convergence rate of the gradient descent method for functions with Hölder continuous gradients , 2016, Optim. Lett..

[13] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[14] Tara Javidi,et al. Peer-to-peer Federated Learning on Graphs , 2019, ArXiv.

[15] Hadrien Hendrikx,et al. An Optimal Algorithm for Decentralized Finite Sum Optimization , 2020, SIAM J. Optim..

[16] Yuchen Zhang,et al. DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.

[17] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[19] A. Banerjee. Convex Analysis and Optimization , 2006 .

[20] D. Gleich. TRUST REGION METHODS , 2017 .

[21] Tianbao Yang,et al. Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent , 2013, NIPS.

[22] Yurii Nesterov,et al. Superfast Second-Order Methods for Unconstrained Convex Optimization , 2020, Journal of Optimization Theory and Applications.

[23] Yair Carmon,et al. Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems , 2018, NeurIPS.

[24] Nicolas Boumal,et al. Adaptive regularization with cubics on manifolds with a first-order analysis , 2018 .

[25] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[26] Haishan Ye,et al. Multi-consensus Decentralized Accelerated Gradient Descent , 2020, ArXiv.

[27] Asuman E. Ozdaglar,et al. A distributed newton method for network optimization , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[28] Shuzhong Zhang,et al. A Cubic Regularized Newton's Method over Riemannian Manifolds , 2018, 1805.05565.

[29] Marc Teboulle,et al. Interior Gradient and Proximal Methods for Convex and Conic Optimization , 2006, SIAM J. Optim..

[30] Zhouchen Lin,et al. A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods , 2018, 1810.01053.

[31] Nicholas I. M. Gould,et al. Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[32] Angelia Nedic,et al. A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[33] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.

[34] P. Dvurechensky. Gradient Method With Inexact Oracle for Composite Non-Convex Optimization , 2017, 1703.09180.

[35] Yurii Nesterov,et al. Universal gradient methods for convex optimization problems , 2015, Math. Program..

[36] Y. Nesterov,et al. Efficient numerical methods for entropy-linear programming problems , 2016, Computational Mathematics and Mathematical Physics.

[37] Yi Zhou,et al. A Note on Inexact Condition for Cubic Regularized Newton's Method , 2018, ArXiv.

[38] Nicolas Boumal,et al. Adaptive regularization with cubics on manifolds , 2018, Mathematical Programming.

[39] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.