Fundamental Limits of Coded Linear Transform

In large scale distributed linear transform problems, coded computation plays an important role to effectively deal with "stragglers" (distributed computations that may get delayed due to few slow or faulty processors). We propose a coded computation strategy, referred to as diagonal code, that achieves the optimum recovery threshold and the optimum computation load. This is the first code that simultaneously achieves two-fold optimality in coded distributed linear transforms. Furthermore, by leveraging the idea from random proposal graph theory, we design two random codes that can guarantee optimum recovery threshold with high probability but with much less computation load. These codes provide order-wise improvement over the state-of-the-art. Moreover, the experimental results show significant improvement compared to both uncoded and existing coding schemes.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Dimitris S. Papailiopoulos,et al.  Coded computation for multicore setups , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[3]  David W. Walkup,et al.  Matchings in random regular bipartite digraphs , 1980, Discret. Math..

[4]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[5]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[6]  Ness B. Shroff,et al.  Coded Sparse Matrix Multiplication , 2018, ICML.

[7]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[8]  M. Stephanov,et al.  Random Matrices , 2005, hep-ph/0509286.

[9]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[10]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[11]  Dimitris S. Papailiopoulos,et al.  Speeding up distributed machine learning using codes , 2016, ISIT.

[12]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[13]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[14]  Jacob T. Schwartz,et al.  Fast Probabilistic Algorithms for Verification of Polynomial Identities , 1980, J. ACM.

[15]  Michael Luby,et al.  LT codes , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[16]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.