On the Fundamental Limits of Coded Data Shuffling

We consider the data shuffling problem, in which a master node is connected to a set of worker nodes, via a shared link, in order to communicate a set of files to the worker nodes. The master node has access to a database of files. In every shuffling iteration, each worker node processes a new subset of files, and has excess storage to partially cache the remaining files. We characterize the exact rate-memory trade-off for the worst-case shuffling under the assumption that cached files are uncoded, by deriving the minimum communication rate for a given storage capacity per worker node. As a byproduct, the exact rate-memory trade-off for any random shuffling is characterized when the number of files is equal to the number of worker nodes. We propose a novel deterministic and systematic coded shuffling scheme, which improves the state of the art. Then, we prove the optimality of our proposed scheme by deriving a matching lower bound and showing that the placement phase of the proposed coded shuffling scheme is optimal over all shuffles.

[1]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[2]  Christopher Ré,et al.  Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and Consequences , 2012, COLT.

[3]  Ali H. Sayed,et al.  Stochastic Learning under Random Reshuffling , 2018, ArXiv.

[4]  Ziv Bar-Yossef,et al.  Index Coding With Side Information , 2011, IEEE Trans. Inf. Theory.

[5]  Soheil Mohajer,et al.  On the Fundamental Limits of Coded Data Shuffling for Distributed Machine Learning , 2020, IEEE Transactions on Information Theory.

[6]  Ravi Tandon,et al.  Information Theoretic Limits of Data Shuffling for Distributed Learning , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[7]  Richard Cole,et al.  Edge-Coloring Bipartite Multigraphs in O(E logD) Time , 1999, Comb..

[8]  A. Salman Avestimehr,et al.  The Exact Rate-Memory Tradeoff for Caching With Uncoded Prefetching , 2016, IEEE Transactions on Information Theory.

[9]  P. Hall On Representatives of Subsets , 1935 .

[10]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[12]  Ravi Tandon,et al.  Near Optimal Coded Data Shuffling for Distributed Learning , 2018, IEEE Transactions on Information Theory.

[13]  Tie-Yan Liu,et al.  Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling , 2017, Neurocomputing.

[14]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[15]  Ravi Tandon,et al.  On the worst-case communication overhead for distributed data shuffling , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[17]  Christina Fragouli,et al.  A pliable index coding approach to data shuffling , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[18]  Ohad Shamir,et al.  Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Mohammad Ali Maddah-Ali,et al.  Coded MapReduce , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21]  Suvrit Sra,et al.  Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.

[22]  Urs Niesen,et al.  Fundamental limits of caching , 2012, 2013 IEEE International Symposium on Information Theory.

[23]  Jichan Chung UberShuffle: Communication-efficient Data Shuffling for SGD via Coding Theory , 2017 .

[24]  Ashish Goel,et al.  Perfect Matchings in O(nlog n) Time in Regular Bipartite Graphs , 2013, SIAM J. Comput..

[25]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[26]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[27]  Ashish Goel,et al.  Perfect matchings in o(n log n) time in regular bipartite graphs , 2009, STOC '10.

[28]  Mohammad Ali Maddah-Ali,et al.  A Unified Coding Framework for Distributed Computing with Straggling Servers , 2016, 2016 IEEE Globecom Workshops (GC Wkshps).