Heterogeneous Computation Assignments in Coded Elastic Computing

We study the optimal design of a heterogeneous coded elastic computing (CEC) network where machines have varying relative computation speeds. CEC introduced by Yang et al. is a framework which mitigates the impact of elastic events, where machines join and leave the network. A set of data is distributed among storage constrained machines using a Maximum Distance Separable (MDS) code such that any subset of machines of a specific size can perform the desired computations. This design eliminates the need to re-distribute the data after each elastic event. In this work, we develop a process for an arbitrary heterogeneous computing network to minimize the overall computation time by defining an optimal computation load, or number of computations assigned to each machine. We then present an algorithm to define a specific computation assignment among the machines that makes use of the MDS code and meets the optimal computation load.

[1]  Mohammad Ali Maddah-Ali,et al.  Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[2]  Soummya Kar,et al.  Coded Elastic Computing , 2018, 2019 IEEE International Symposium on Information Theory (ISIT).

[3]  Daniela Tuninetti,et al.  Fundamental Limits of Distributed Data Shuffling , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Giuseppe Caire,et al.  Fundamental Limits of Decentralized Data Shuffling , 2018, IEEE Transactions on Information Theory.

[5]  Zahir Tari,et al.  Optimizing the Transition Waste in Coded Elastic Computing , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[6]  Rong-Rong Chen,et al.  An Optimal Iterative Placement Algorithm for PIR from Heterogeneous Storage-Constrained Databases , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[7]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[8]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[9]  Soheil Mohajer,et al.  On the Fundamental Limits of Coded Data Shuffling , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[10]  Ravi Tandon,et al.  Near Optimal Coded Data Shuffling for Distributed Learning , 2018, IEEE Transactions on Information Theory.