Distributed Adaptive Newton Methods with Globally Superlinear Convergence

This paper considers the distributed optimization problem over a network where the global objective is to optimize a sum of local functions using only local computation and communication. Since the existing algorithms either adopt a linear consensus mechanism, which converges at best linearly, or assume that each node starts sufficiently close to an optimal solution, they cannot achieve globally superlinear convergence. To break through the linear consensus rate, we propose a finite-time set-consensus method, and then incorporate it into Polyak's adaptive Newton method, leading to our distributed adaptive Newton algorithm (DAN). To avoid transmitting local Hessians, we adopt a low-rank approximation idea to compress the Hessian and design a communication-efficient DAN-LA. Then, the size of transmitted messages in DAN-LA is reduced to $O(p)$ per iteration, where $p$ is the dimension of decision vectors and is the same as the first-order methods. We show that DAN and DAN-LA can globally achieve quadratic and superlinear convergence rates, respectively. Numerical experiments on logistic regression problems are finally conducted to show the advantages over existing methods.

[1]  Damiano Varagnolo,et al.  Newton-Raphson Consensus for Distributed Convex Optimization , 2015, IEEE Transactions on Automatic Control.

[2]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[3]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[4]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[5]  Ying Sun,et al.  Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks , 2020, AISTATS.

[6]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[7]  Alejandro Ribeiro,et al.  Accelerated Dual Descent for Network Flow Optimization , 2014, IEEE Transactions on Automatic Control.

[8]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[9]  Hai Liu,et al.  A Distributed and Efficient Flooding Scheme Using 1-Hop Information in Mobile Ad Hoc Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[10]  Christoforos N. Hadjicostis,et al.  Distributed Finite-Time Average-Consensus With Limited Computational and Storage Capability , 2017, IEEE Transactions on Control of Network Systems.

[11]  Haitham Bou-Ammar,et al.  Distributed Newton Method for Large-Scale Consensus Optimization , 2016, IEEE Transactions on Automatic Control.

[12]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[13]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[14]  Jiaqi Zhang,et al.  Distributed Dual Gradient Tracking for Resource Allocation in Unbalanced Networks , 2020, IEEE Transactions on Signal Processing.

[15]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent , 2017, IEEE Transactions on Automatic Control.

[16]  Chong Jin Ong,et al.  Speeding up finite-time consensus via minimal polynomial of a weighted graph - a numerical approach , 2017, Autom..

[17]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[18]  Lihua Xie,et al.  Distributed Consensus With Limited Communication Data Rate , 2011, IEEE Transactions on Automatic Control.

[19]  Aryan Mokhtari,et al.  DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate , 2019, AISTATS.

[20]  Andrey Tremba,et al.  New versions of Newton method: step-size choice, convergence domain and under-determined equations , 2017, Optim. Methods Softw..

[21]  Jiaqi Zhang,et al.  AsySPA: An Exact Asynchronous Algorithm for Convex Optimization Over Digraphs , 2018, IEEE Transactions on Automatic Control.

[22]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[23]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[24]  Jiaqi Zhang,et al.  Decentralized Stochastic Gradient Tracking for Empirical Risk Minimization , 2019, ArXiv.

[25]  Ermin Wei,et al.  A Fast Distributed Asynchronous Newton-Based Optimization Algorithm , 2019, IEEE Transactions on Automatic Control.

[26]  Brian D. O. Anderson,et al.  Analysis of accelerated gossip algorithms , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[27]  Juan M. Corchado,et al.  Convergence of Distributed Flooding and Its Application for Distributed Bayesian Filtering , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[28]  Xuyang Wu,et al.  Finite-Time-Consensus-Based Methods for Distributed optimization , 2019, 2019 Chinese Control Conference (CCC).

[29]  Van Sy Mai,et al.  Linear Convergence in Optimization Over Directed Graphs With Row-Stochastic Matrices , 2016, IEEE Transactions on Automatic Control.

[30]  Christoforos N. Hadjicostis,et al.  Laplacian-based matrix design for finite-time aveazge consensus in digraphs , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[31]  Mingyi Hong,et al.  Distributed Non-Convex First-Order optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[32]  Aryan Mokhtari,et al.  Network Newton Distributed Optimization Methods , 2017, IEEE Transactions on Signal Processing.

[33]  Jiaqi Zhang,et al.  Asynchronous Decentralized Optimization in Directed Networks , 2019, ArXiv.

[34]  Aryan Mokhtari,et al.  A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[35]  Ivan Markovsky,et al.  Low Rank Approximation - Algorithms, Implementation, Applications , 2018, Communications and Control Engineering.

[36]  Shreyas Sundaram,et al.  Distributed Finite-Time Optimization , 2018, 2018 IEEE 14th International Conference on Control and Automation (ICCA).

[37]  J. J. Moré,et al.  A Characterization of Superlinear Convergence and its Application to Quasi-Newton Methods , 1973 .

[38]  Usman A. Khan,et al.  Distributed Heavy-Ball: A Generalization and Acceleration of First-Order Methods With Gradient Tracking , 2018, IEEE Transactions on Automatic Control.

[39]  Aryan Mokhtari,et al.  A Primal-Dual Quasi-Newton Method for Exact Consensus Optimization , 2018, IEEE Transactions on Signal Processing.

[40]  Ling Shi,et al.  Decentralised minimum-time consensus , 2013, Autom..

[41]  Arjan J. Mooij,et al.  A distributed spanning tree algorithm for topology-aware networks , 2003 .

[42]  B. Bollobás The evolution of random graphs , 1984 .

[43]  Ermin Wei,et al.  Superlinearly convergent asynchronous distributed network newton method , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[44]  Shusen Wang,et al.  GIANT: Globally Improved Approximate Newton Method for Distributed Optimization , 2017, NeurIPS.

[45]  Marcelo G. S. Bruno,et al.  Cooperative Target Tracking Using Decentralized Particle Filtering and RSS Sensors , 2013, IEEE Transactions on Signal Processing.

[46]  Jiaqi Zhang,et al.  Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization , 2019 .

[47]  Keyou You,et al.  Distributed Conjugate Gradient Tracking for Resource Allocation in Unbalanced Networks , 2019, ArXiv.

[48]  Cheng Wu,et al.  Distributed Convex Optimization with Inequality Constraints over Time-Varying Unbalanced Digraphs , 2016, IEEE Transactions on Automatic Control.

[49]  Peter Robinson,et al.  A time- and message-optimal distributed algorithm for minimum spanning trees , 2016, STOC.