An Asynchronous Distributed Framework for Large-scale Learning Based on Parameter Exchanges

In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies. In this paper, we propose an effective asynchronous distributed framework for the minimization of a sum of smooth functions, where each machine performs iterations in parallel on its local function and updates a shared parameter asynchronously. In this way, all machines can continuously work even though they do not have the latest version of the shared parameter. We prove the convergence of the consistency of this general distributed asynchronous method for gradient iterations then show its efficiency on the matrix factorization problem for recommender systems and on binary classification.

[1]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[2]  Heng Huang,et al.  Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization , 2016, AAAI 2016.

[3]  Xiangfeng Wang,et al.  Asynchronous Distributed ADMM for Large-Scale Optimization—Part I: Algorithm and Convergence Analysis , 2015, IEEE Transactions on Signal Processing.

[4]  James T. Kwok,et al.  Fast Distributed Asynchronous SGD with Variance Reduction , 2015, ArXiv.

[5]  Chih-Jen Lin,et al.  A Learning-Rate Schedule for Stochastic Gradient Methods to Matrix Factorization , 2015, PAKDD.

[6]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[7]  Wu-Jun Li,et al.  Distributed Stochastic ADMM for Matrix Factorization , 2014, CIKM.

[8]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[9]  Peter J. Haas,et al.  Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion , 2013, Knowledge and Information Systems.

[10]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[11]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[12]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[13]  Suvrit Sra,et al.  Scalable nonconvex inexact proximal splitting , 2012, NIPS.

[14]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[15]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[16]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[17]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[18]  Patrice Marcotte,et al.  Co-Coercivity and Its Role in the Convergence of Iterative Schemes for Solving Variational Inequalities , 1996, SIAM J. Optim..

[19]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .