Distributed Stochastic Gradient Descent Using LDGM Codes

We consider a distributed learning problem in which the computation is carried out on a system consisting of a master node and multiple worker nodes. In such systems, the existence of slow-running machines called stragglers will cause a significant decrease in performance. Recently, coding theoretic framework, which is named Gradient Coding (GC), for mitigating stragglers in distributed learning has been established by Tandon et al. Most studies on GC are aiming at recovering the gradient information completely assuming that the Gradient Descent (GD) algorithm is used as a learning algorithm. On the other hand, if the Stochastic Gradient Descent (SGD) algorithm is used, it is not necessary to completely recover the gradient information, and its unbiased estimator is sufficient for the learning. In this paper, we propose a distributed SGD scheme using Low Density Generator Matrix (LDGM) codes. In the proposed system, it may take longer time than existing GC methods to recover the gradient information completely, however, it enables the master node to obtain a high-quality unbiased estimator of the gradient at low computational cost and it leads to overall performance improvement.

[1]  Suhas N. Diggavi,et al.  Straggler Mitigation in Distributed Optimization Through Data Encoding , 2017, NIPS.

[2]  Babak Hassibi,et al.  Improving Distributed Gradient Descent Using Reed-Solomon Codes , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[3]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[4]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[5]  Rüdiger L. Urbanke,et al.  Modern Coding Theory , 2008 .

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[8]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[9]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[10]  Alexandros G. Dimakis,et al.  Gradient Coding From Cyclic MDS Codes and Expander Graphs , 2017, IEEE Transactions on Information Theory.

[11]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[12]  Arya Mazumdar,et al.  Robust Gradient Descent via Moment Encoding with LDPC Codes , 2018, ArXiv.

[13]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[14]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[15]  Min Ye,et al.  Communication-Computation Efficient Gradient Coding , 2018, ICML.