On Distributed Computing with Heterogeneous Communication Constraints

We consider a distributed computing framework where the distributed nodes have different communication capabilities, motivated by the heterogeneous networks in data centers and mobile edge computing systems. Following the structure of MapReduce, this framework consists of Map computation phase, Shuffle phase, and Reduce computation phase. The Shuffle phase allows distributed nodes to exchange intermediate values, in the presence of heterogeneous communication bottlenecks for different nodes (heterogeneous communication load constraints). For this setting, we characterize the minimum total computation load and the minimum worst-case computation load in some cases, under the heterogeneous communication load constraints. While the total computation load depends on the sum of the computation loads of all the nodes, the worst-case computation load depends on the computation load of a node with the heaviest job. We show an interesting insight that, for some cases, there is a tradeoff between the minimum total computation load and the minimum worst-case computation load, in the sense that both cannot be achieved at the same time. The achievability schemes are proposed with careful design on the file assignment and data shuffling. Beyond the cut-set bound, a novel converse is proposed using the proof by contradiction. For the general case, we identify two extreme regimes in which the scheme with coding and the scheme without coding are optimal, respectively.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Zhiying Wang,et al.  Wireless MapReduce Distributed Computing , 2019, IEEE Transactions on Information Theory.

[3]  Amir Salman Avestimehr,et al.  On Heterogeneous Coded Distributed Computing , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[4]  Mohammad Ali Maddah-Ali,et al.  Coded MapReduce , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[6]  Mohammad Ali Maddah-Ali,et al.  How to optimally allocate resources for coded distributed computing? , 2017, 2017 IEEE International Conference on Communications (ICC).

[7]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).