PROVABLY COMMUNICATION-EFFICIENT ASYNCHRONOUS DISTRIBUTED INFERENCE FOR CONVEX AND NONCONVEX PROBLEMS

This paper proposes and analyzes an asynchronous communication-efficient distributed optimization framework for a general type of machine learning and signal processing problems. At each iteration, worker machines compute gradients of a known empirical loss function using their own local data, and a master machine solves a related minimization problem to update the current estimate. We establish that the proposed algorithm converges with a sublinear rate over the number of communication rounds, coinciding with the best theoretical rate that can be achieved for nonconvex nonsmooth problems. Moreover, under a strong convexity assumption of the smooth part of the loss function, linear convergence is established. Extensive numerical experiments show that the performance of the proposed approach indeed improves – sometimes significantly – over other state-of-the-art algorithms in terms of total communication efficiency.

[1]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[2]  Francisco Facchinei,et al.  Asynchronous parallel algorithms for nonconvex optimization , 2016, Mathematical Programming.

[3]  Ohad Shamir,et al.  Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Tianbao Yang,et al.  Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity , 2015 .

[5]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[6]  Nicholas I. M. Gould,et al.  On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems , 2010, SIAM J. Optim..

[7]  Jarvis D. Haupt,et al.  Communication-Efficient distributed optimization for sparse learning via two-way truncation , 2017, 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[8]  Xiangfeng Wang,et al.  Asynchronous Distributed ADMM for Large-Scale Optimization—Part I: Algorithm and Convergence Analysis , 2015, IEEE Transactions on Signal Processing.

[9]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[10]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[11]  Francisco Facchinei,et al.  Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization: Model and Convergence , 2016, ArXiv.

[12]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[13]  Yun Yang,et al.  Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[14]  Peter Richt'arik,et al.  Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes , 2012, Optimization and Engineering.

[15]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[17]  Jarvis D. Haupt,et al.  A Provably Communication-Efficient Asynchronous Distributed Inference Method for Convex and Nonconvex Problems , 2020, IEEE Transactions on Signal Processing.

[18]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[19]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[20]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2015, ICASSP.

[21]  Mladen Kolar,et al.  Efficient Distributed Learning with Sparsity , 2016, ICML.

[22]  Ohad Shamir,et al.  Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[23]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[24]  Xiangfeng Wang,et al.  Asynchronous Distributed ADMM for Large-Scale Optimization—Part II: Linear Convergence Analysis and Numerical Performance , 2015, IEEE Transactions on Signal Processing.