A Distributed Cubic-Regularized Newton Method for Smooth Convex Optimization over Networks

We propose a distributed, cubic-regularized Newton method for large-scale convex optimization over networks. The proposed method requires only local computations and communications and is suitable for federated learning applications over arbitrary network topologies. We show a $O(k^{{-}3})$ convergence rate when the cost function is convex with Lipschitz gradient and Hessian, with $k$ being the number of iterations. We further provide network-dependent bounds for the communication required in each step of the algorithm. We provide numerical experiments that validate our theoretical results.

[1]  Aryan Mokhtari,et al.  A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[2]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[3]  Haitham Bou-Ammar,et al.  Distributed Newton Method for Large-Scale Consensus Optimization , 2016, IEEE Transactions on Automatic Control.

[4]  Nicholas I. M. Gould,et al.  Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization , 2012, Optim. Methods Softw..

[5]  Jean-Louis Goffin,et al.  On convergence rates of subgradient optimization methods , 1977, Math. Program..

[6]  Alejandro Ribeiro,et al.  Accelerated Dual Descent for Network Flow Optimization , 2014, IEEE Transactions on Automatic Control.

[7]  Franziska Wulf,et al.  Minimization Methods For Non Differentiable Functions , 2016 .

[8]  Aryan Mokhtari,et al.  Decentralized Quasi-Newton Methods , 2016, IEEE Transactions on Signal Processing.

[9]  Yurii Nesterov,et al.  On inexact solution of auxiliary problems in tensor methods for convex optimization , 2021, Optim. Methods Softw..

[10]  M. Baes Estimate sequence methods: extensions and approximations , 2009 .

[11]  Marc Teboulle,et al.  A fast dual proximal gradient algorithm for convex minimization and applications , 2014, Oper. Res. Lett..

[12]  Maryam Yashtini,et al.  On the global convergence rate of the gradient descent method for functions with Hölder continuous gradients , 2016, Optim. Lett..

[13]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[14]  Tara Javidi,et al.  Peer-to-peer Federated Learning on Graphs , 2019, ArXiv.

[15]  Hadrien Hendrikx,et al.  An Optimal Algorithm for Decentralized Finite Sum Optimization , 2020, SIAM J. Optim..

[16]  Yuchen Zhang,et al.  DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[19]  A. Banerjee Convex Analysis and Optimization , 2006 .

[20]  D. Gleich TRUST REGION METHODS , 2017 .

[21]  Tianbao Yang,et al.  Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent , 2013, NIPS.

[22]  Yurii Nesterov,et al.  Superfast Second-Order Methods for Unconstrained Convex Optimization , 2020, Journal of Optimization Theory and Applications.

[23]  Yair Carmon,et al.  Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems , 2018, NeurIPS.

[24]  Nicolas Boumal,et al.  Adaptive regularization with cubics on manifolds with a first-order analysis , 2018 .

[25]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[26]  Haishan Ye,et al.  Multi-consensus Decentralized Accelerated Gradient Descent , 2020, ArXiv.

[27]  Asuman E. Ozdaglar,et al.  A distributed newton method for network optimization , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[28]  Shuzhong Zhang,et al.  A Cubic Regularized Newton's Method over Riemannian Manifolds , 2018, 1805.05565.

[29]  Marc Teboulle,et al.  Interior Gradient and Proximal Methods for Convex and Conic Optimization , 2006, SIAM J. Optim..

[30]  Zhouchen Lin,et al.  A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods , 2018, 1810.01053.

[31]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[32]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[33]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[34]  P. Dvurechensky Gradient Method With Inexact Oracle for Composite Non-Convex Optimization , 2017, 1703.09180.

[35]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[36]  Y. Nesterov,et al.  Efficient numerical methods for entropy-linear programming problems , 2016, Computational Mathematics and Mathematical Physics.

[37]  Yi Zhou,et al.  A Note on Inexact Condition for Cubic Regularized Newton's Method , 2018, ArXiv.

[38]  Nicolas Boumal,et al.  Adaptive regularization with cubics on manifolds , 2018, Mathematical Programming.

[39]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[40]  Quanquan Gu,et al.  Stochastic Variance-Reduced Cubic Regularized Newton Method , 2018, ICML.

[41]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[42]  D. Kamzolov Near-Optimal Hyperfast Second-Order Method for Convex Optimization , 2020, 2002.09050.

[43]  Y. Nesterov,et al.  Tensor Methods for Minimizing Functions with H\"{o}lder Continuous Higher-Order Derivatives , 2019 .

[44]  Yi Zhou,et al.  Sample Complexity of Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization , 2018, AISTATS.

[45]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..

[46]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[47]  Amaury Bouchra Pilet,et al.  Simple, Efficient and Convenient Decentralized Multi-task Learning for Neural Networks , 2020, IDA.

[48]  Coralia Cartis,et al.  A concise second-order complexity analysis for unconstrained optimization using high-order regularized models , 2020, Optim. Methods Softw..

[49]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[50]  Tianyi Lin,et al.  A unified scheme to accelerate adaptive cubic regularization and gradient methods for convex optimization , 2017, 1710.04788.

[51]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[52]  Ambuj Tewari,et al.  Applications of strong convexity--strong smoothness duality to learning with matrices , 2009, ArXiv.

[53]  Saeed Ghadimi,et al.  Second-Order Methods with Cubic Regularization Under Inexact Information , 2017, 1710.05782.

[54]  Quanquan Gu,et al.  Stochastic Variance-Reduced Cubic Regularization Methods , 2019, J. Mach. Learn. Res..

[55]  Ji Liu,et al.  Inexact Proximal Cubic Regularized Newton Methods for Convex Optimization , 2019, 1902.02388.

[56]  P. Dvurechensky,et al.  Universal intermediate gradient method for convex problems with inexact oracle , 2017, Optim. Methods Softw..

[57]  Renato D. C. Monteiro,et al.  An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[58]  A. A. Bennett Newton's Method in General Analysis. , 1916, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Yurii Nesterov,et al.  Implementable tensor methods in unconstrained convex optimization , 2019, Mathematical Programming.

[60]  Yin Tat Lee,et al.  Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives , 2019, Annual Conference Computational Learning Theory.

[61]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[62]  Shusen Wang,et al.  GIANT: Globally Improved Approximate Newton Method for Distributed Optimization , 2017, NeurIPS.

[63]  Yurii Nesterov Cubic Regularization of Newton's Method for Convex Problems with Constraints , 2006 .

[64]  Peter Richtárik,et al.  Randomized Block Cubic Newton Method , 2018, ICML.

[65]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.