Approximately Optimal Distributed Data Shuffling

Data shuffling between distributed workers is one of the critical steps in implementing large-scale learning algorithms. The focus of this work is to understand the fundamental trade-off between the amount of storage and the communication overhead for distributed data shuffling. We first present an information theoretic formulation for the data shuffling problem, accounting for the underlying problem parameters (i.e., number of workers, K, number of data points, N, and the available storage, $S$ per node). Then, we derive an information theoretic lower bound on the communication overhead for data shuffling as a function of these parameters. Next, we present a novel coded communication scheme and show that the resulting communication overhead of the proposed scheme is within a multiplicative factor of at most 2 from the lower bound. Furthermore, we introduce an improved aligned coded shuffling scheme, which achieves the optimal storage vs communication trade-off for K < 5, and further reduces the maximum multiplicative gap down to 7/6, for $K$ ≥ 5.

[1]  Ziv Bar-Yossef,et al.  Index Coding With Side Information , 2006, IEEE Transactions on Information Theory.

[2]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[3]  A. Salman Avestimehr,et al.  The Exact Rate-Memory Tradeoff for Caching With Uncoded Prefetching , 2016, IEEE Transactions on Information Theory.

[4]  Ravi Tandon,et al.  On the worst-case communication overhead for distributed data shuffling , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Ravi Tandon,et al.  Near Optimal Coded Data Shuffling for Distributed Learning , 2018, IEEE Transactions on Information Theory.

[6]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[7]  Urs Niesen,et al.  Fundamental limits of caching , 2012, 2013 IEEE International Symposium on Information Theory.

[8]  Daniela Tuninetti,et al.  On the optimality of uncoded cache placement , 2015, 2016 IEEE Information Theory Workshop (ITW).

[9]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[10]  Christina Fragouli,et al.  A pliable index coding approach to data shuffling , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[11]  Ravi Tandon,et al.  Information Theoretic Limits of Data Shuffling for Distributed Learning , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).