Distributed Linearly Separable Computation

This paper formulates a distributed computation problem, where a master asks $N$ distributed workers to compute a linearly separable function. The task function can be expressed as $K_c$ linear combinations to $K$ messages, where each message is a function of one dataset. Our objective is to find the optimal tradeoff between the computation cost (number of datasets assigned to each worker) and the communication cost (number of symbols the master should download), such that from the answers of any $N_r$ out of $N$ workers the master can recover the task function. The formulated problem can be seen as the generalized version of some existing problems, such as distributed gradient descent and distributed linear transform. In this paper, we consider the specific case where the computation cost is minimum, and propose novel converse and achievable bounds on the optimal communication cost. The proposed bounds coincide for some system parameters; when they do not match, we prove that the achievable distributed computing scheme is optimal under the constraint of a widely used `cyclic assignment' on the datasets. Our results also show that when $K = N$, with the same communication cost as the optimal distributed gradient descent coding scheme propose by Tandon et al. from which the master recovers one linear combination of $K$ messages, our proposed scheme can let the master recover any additional $N_r -1$ independent linear combinations of messages with high probability.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  A. Salman Avestimehr,et al.  The Exact Rate-Memory Tradeoff for Caching With Uncoded Prefetching , 2016, IEEE Transactions on Information Theory.

[3]  Anindya Bijoy Das,et al.  Straggler-Resistant Distributed Matrix Computation via Coding Theory: Removing a Bottleneck in Large-Scale Data Processing , 2020, IEEE Signal Processing Magazine.

[4]  Mohammad Ali Maddah-Ali,et al.  Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[5]  Amir Salman Avestimehr,et al.  Near-Optimal Straggler Mitigation for Distributed Gradient Methods , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[6]  Anindya Bijoy Das,et al.  Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[7]  Amir Salman Avestimehr,et al.  Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy , 2018, AISTATS.

[8]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[9]  Li Tang,et al.  Universally Decodable Matrices for Distributed Matrix-Vector Multiplication , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[10]  Kannan Ramchandran,et al.  Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[11]  Kannan Ramchandran,et al.  High-dimensional coded matrix multiplication , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[12]  Amir Salman Avestimehr,et al.  Tree Gradient Coding , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Giuseppe Caire,et al.  Fundamental Limits of Caching in Wireless D2D Networks , 2014, IEEE Transactions on Information Theory.

[15]  Syed A. Jafar,et al.  Cross Subspace Alignment Codes for Coded Distributed Batch Computation. , 2019 .

[16]  Ness B. Shroff,et al.  Coded Sparse Matrix Multiplication , 2018, ICML.

[17]  Richard Zippel,et al.  Probabilistic algorithms for sparse polynomials , 1979, EUROSAM.

[18]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[19]  Farzin Haddadpour,et al.  Codes for Distributed Finite Alphabet Matrix-Vector Multiplication , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[20]  Babak Hassibi,et al.  Improving Distributed Gradient Descent Using Reed-Solomon Codes , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[21]  Richard J. Lipton,et al.  A Probabilistic Remark on Algebraic Program Testing , 1978, Inf. Process. Lett..

[22]  Ness B. Shroff,et al.  Fundamental Limits of Coded Linear Transform , 2018, ArXiv.

[23]  Deniz Gündüz,et al.  Straggler-Aware Distributed Learning: Communication–Computation Latency Trade-Off , 2020, Entropy.

[24]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[25]  Alexandros G. Dimakis,et al.  Gradient Coding From Cyclic MDS Codes and Expander Graphs , 2017, IEEE Transactions on Information Theory.

[26]  Min Ye,et al.  Communication-Computation Efficient Gradient Coding , 2018, ICML.

[27]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[28]  Daniela Tuninetti,et al.  On the optimality of uncoded cache placement , 2015, 2016 IEEE Information Theory Workshop (ITW).

[29]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[30]  Mohammad Ali Maddah-Ali,et al.  A Unified Coding Framework for Distributed Computing with Straggling Servers , 2016, 2016 IEEE Globecom Workshops (GC Wkshps).

[31]  Jacob T. Schwartz,et al.  Fast Probabilistic Algorithms for Verification of Polynomial Identities , 1980, J. ACM.

[32]  Urs Niesen,et al.  Fundamental limits of caching , 2012, 2013 IEEE International Symposium on Information Theory.

[33]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[34]  Farzin Haddadpour,et al.  On the optimal recovery threshold of coded matrix multiplication , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).