Distributed Computing with Heterogeneous Communication Constraints: The Worst-Case Computation Load and Proof by Contradiction

We consider a distributed computing framework where the distributed nodes have different communication capabilities, motivated by the heterogeneous networks in data centers and mobile edge computing systems. Following the structure of MapReduce, this framework consists of Map computation phase, Shuffle phase, and Reduce computation phase. The Shuffle phase allows distributed nodes to exchange intermediate values, in the presence of heterogeneous communication bottlenecks for different nodes (heterogeneous communication load constraints). Focusing on two-node and three-node (K=2, 3) distributed computing systems with heterogeneous communication load constraints, in this work we characterize the minimum total computation load, as well as the minimum worst-case computation load for some cases. The worst-case computation load depends on the computation load of a node with the heaviest job. Therefore, by minimizing the worst-case computation load it could potentially minimize the system latency. We show an interesting insight that, for some cases, there is a tradeoff between the minimum total computation load and the minimum worst-case computation load, in the sense that both cannot be achieved at the same time. The achievability schemes are proposed with careful design on the file assignment and data shuffling. Finally, beyond the cut-set bound, a novel converse is proposed using the proof by contradiction.

[1]  Meixia Tao,et al.  Exploiting Computation Replication in Multi-User Multi-Server Mobile Edge Computing Networks , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[2]  Mohammad Ali Maddah-Ali,et al.  Coded distributed computing: Fundamental limits and practical challenges , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[3]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[4]  Sheng Yang,et al.  A Storage-Computation-Communication Tradeoff for Distributed Computing , 2018, 2018 15th International Symposium on Wireless Communication Systems (ISWCS).

[5]  Weijia Jia,et al.  Heterogeneous NetwOrk Policy Enforcement in data centers , 2017, 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[6]  Ravi Tandon,et al.  On the worst-case communication overhead for distributed data shuffling , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[8]  Mohammad Ali Maddah-Ali,et al.  How to optimally allocate resources for coded distributed computing? , 2017, 2017 IEEE International Conference on Communications (ICC).

[9]  Ravi Tandon,et al.  Combating Computational Heterogeneity in Large-Scale Distributed Computing via Work Exchange , 2017, ArXiv.

[10]  Zdenek Becvar,et al.  Mobile Edge Computing: A Survey on Architecture and Computation Offloading , 2017, IEEE Communications Surveys & Tutorials.

[11]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[12]  Amir Salman Avestimehr,et al.  On Heterogeneous Coded Distributed Computing , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[13]  Christina Fragouli,et al.  A pliable index coding approach to data shuffling , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[14]  Amir Salman Avestimehr,et al.  Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Ramtin Pedarsani,et al.  Latency analysis of coded computation schemes over wireless networks , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[17]  Christina Fragouli,et al.  Communication vs distributed computation: An alternative trade-off curve , 2017, 2017 IEEE Information Theory Workshop (ITW).

[18]  Kannan Ramchandran,et al.  High-dimensional coded matrix multiplication , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[19]  Osvaldo Simeone,et al.  Improved Latency-communication Trade-off for Map-shuffle-reduce Systems with Stragglers , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Ravi Tandon,et al.  Information Theoretic Limits of Data Shuffling for Distributed Learning , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[21]  Amir Salman Avestimehr,et al.  Coded Computing for Distributed Graph Analytics , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[22]  K. B. Letaief,et al.  A Survey on Mobile Edge Computing: The Communication Perspective , 2017, IEEE Communications Surveys & Tutorials.

[23]  Petros Elia,et al.  Coded Distributed Computing with Node Cooperation Substantially Increases Speedup Factors , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[24]  Mohammad Ali Maddah-Ali,et al.  Coded MapReduce , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[25]  Fan Li,et al.  Wireless MapReduce Distributed Computing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[26]  Rong-Rong Chen,et al.  Coded Distributed Computing with Heterogeneous Function Assignments , 2019, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[27]  Jaekyun Moon,et al.  Hierarchical Coding for Distributed Computing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[28]  Mohammad Ali Maddah-Ali,et al.  Coded Distributed Computing: Straggling Servers and Multistage Dataflows , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29]  Pulkit Grover,et al.  Coded convolution for parallel and distributed computing within a deadline , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[30]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[31]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[32]  Suhas N. Diggavi,et al.  Encoded distributed optimization , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[33]  Jemin Lee,et al.  Mobile Edge Computing-Enabled Heterogeneous Networks , 2018, IEEE Transactions on Wireless Communications.

[34]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.