More Iterations per Second, Same Quality - Why Asynchronous Algorithms may Drastically Outperform Traditional Ones

In this paper, we consider the convergence of a very general asynchronous-parallel algorithm called ARock, that takes many well-known asynchronous algorithms as special cases (gradient descent, proximal gradient, Douglas Rachford, ADMM, etc.). In asynchronous-parallel algorithms, the computing nodes simply use the most recent information that they have access to, instead of waiting for a full update from all nodes in the system. This means that nodes do not have to waste time waiting for information, which can be a major bottleneck, especially in distributed systems. When the system has $p$ nodes, asynchronous algorithms may complete $\Theta(\ln(p))$ more iterations than synchronous algorithms in a given time period ("more iterations per second"). Although asynchronous algorithms may compute more iterations per second, there is error associated with using outdated information. How many more iterations in total are needed to compensate for this error is still an open question. The main results of this paper aim to answer this question. We prove, loosely, that as the size of the problem becomes large, the number of additional iterations that asynchronous algorithms need becomes negligible compared to the total number ("same quality" of the iterations). Taking these facts together, our results provide solid evidence of the potential of asynchronous algorithms to vastly speed up certain distributed computations.

[1]  Wotao Yin,et al.  Asynchronous Coordinate Descent under More Realistic Assumptions , 2017, NIPS.

[2]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[3]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[4]  Cho-Jui Hsieh,et al.  A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.

[5]  Erchin Serpedin,et al.  Synchronization in Wireless Sensor Networks: Parameter Estimation, Performance Benchmarks, and Protocols , 2009 .

[6]  B. Eisenberg On the expectation of the maximum of IID geometric random variables , 2008 .

[7]  Dimitris S. Papailiopoulos,et al.  Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..

[8]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[9]  Michael A. Zazanis,et al.  Renewal Processes , 2011, International Encyclopedia of Statistical Science.

[10]  Wotao Yin,et al.  On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..

[11]  Offer Kella,et al.  Superposition of renewal processes and an application to multi-server queues , 2006 .

[12]  Fabian Pedregosa,et al.  ASAGA: Asynchronous Parallel SAGA , 2016, AISTATS.

[13]  Adrien B. Taylor,et al.  Exact Worst-Case Convergence Rates of the Proximal Gradient Method for Composite Convex Minimization , 2017, Journal of Optimization Theory and Applications.

[14]  Wotao Yin,et al.  Cyclic Coordinate-Update Algorithms for Fixed-Point Problems: Analysis and Applications , 2016, SIAM J. Sci. Comput..

[15]  Ming Yan,et al.  ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates , 2015, SIAM J. Sci. Comput..

[16]  Francisco Facchinei,et al.  Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization. Part II: Complexity and Numerical Results , 2017, 1701.04900.

[17]  Ming Yan,et al.  On the Convergence of Asynchronous Parallel Iteration with Arbitrary Delays , 2016, ArXiv.

[18]  Haim Avron,et al.  Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization , 2013, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.