论文信息 - PROVABLY COMMUNICATION-EFFICIENT ASYNCHRONOUS DISTRIBUTED INFERENCE FOR CONVEX AND NONCONVEX PROBLEMS

PROVABLY COMMUNICATION-EFFICIENT ASYNCHRONOUS DISTRIBUTED INFERENCE FOR CONVEX AND NONCONVEX PROBLEMS

This paper proposes and analyzes an asynchronous communication-efficient distributed optimization framework for a general type of machine learning and signal processing problems. At each iteration, worker machines compute gradients of a known empirical loss function using their own local data, and a master machine solves a related minimization problem to update the current estimate. We establish that the proposed algorithm converges with a sublinear rate over the number of communication rounds, coinciding with the best theoretical rate that can be achieved for nonconvex nonsmooth problems. Moreover, under a strong convexity assumption of the smooth part of the loss function, linear convergence is established. Extensive numerical experiments show that the performance of the proposed approach indeed improves – sometimes significantly – over other state-of-the-art algorithms in terms of total communication efficiency.

Jarvis D. Haupt | Jineng Ren

[1] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[2] Francisco Facchinei,et al. Asynchronous parallel algorithms for nonconvex optimization , 2016, Mathematical Programming.

[3] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4] Tianbao Yang,et al. Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity , 2015 .

[5] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[6] Nicholas I. M. Gould,et al. On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems , 2010, SIAM J. Optim..

[7] Jarvis D. Haupt,et al. Communication-Efficient distributed optimization for sparse learning via two-way truncation , 2017, 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[8] Xiangfeng Wang,et al. Asynchronous Distributed ADMM for Large-Scale Optimization—Part I: Algorithm and Convergence Analysis , 2015, IEEE Transactions on Signal Processing.

[9] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[10] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[11] Francisco Facchinei,et al. Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization: Model and Convergence , 2016, ArXiv.

[12] Quoc V. Le,et al. ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[13] Yun Yang,et al. Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[14] Peter Richt'arik,et al. Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes , 2012, Optimization and Engineering.

[15] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16] Robert Nowak,et al. Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[17] Jarvis D. Haupt,et al. A Provably Communication-Efficient Asynchronous Distributed Inference Method for Convex and Nonconvex Problems , 2020, IEEE Transactions on Signal Processing.

[18] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[19] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[20] Zhi-Quan Luo,et al. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2015, ICASSP.

[21] Mladen Kolar,et al. Efficient Distributed Learning with Sparsity , 2016, ICML.

[22] Ohad Shamir,et al. Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[23] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[24] Xiangfeng Wang,et al. Asynchronous Distributed ADMM for Large-Scale Optimization—Part II: Linear Convergence Analysis and Numerical Performance , 2015, IEEE Transactions on Signal Processing.