论文信息 - On the Fundamental Limits of Coded Data Shuffling

On the Fundamental Limits of Coded Data Shuffling

We consider the data shuffling problem, in which a master node is connected to a set of worker nodes, via a shared link, in order to communicate a set of files to the worker nodes. The master node has access to a database of files. In every shuffling iteration, each worker node processes a new subset of files, and has excess storage to partially cache the remaining files. We characterize the exact rate-memory trade-off for the worst-case shuffling under the assumption that cached files are uncoded, by deriving the minimum communication rate for a given storage capacity per worker node. As a byproduct, the exact rate-memory trade-off for any random shuffling is characterized when the number of files is equal to the number of worker nodes. We propose a novel deterministic and systematic coded shuffling scheme, which improves the state of the art. Then, we prove the optimality of our proposed scheme by deriving a matching lower bound and showing that the placement phase of the proposed coded shuffling scheme is optimal over all shuffles.

Soheil Mohajer | Adel M. Elmahdy | Adel Elmahdy | S. Mohajer

[1] Asuman E. Ozdaglar,et al. Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[2] Christopher Ré,et al. Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and Consequences , 2012, COLT.

[3] Ali H. Sayed,et al. Stochastic Learning under Random Reshuffling , 2018, ArXiv.

[4] Ziv Bar-Yossef,et al. Index Coding With Side Information , 2011, IEEE Trans. Inf. Theory.

[5] Soheil Mohajer,et al. On the Fundamental Limits of Coded Data Shuffling for Distributed Machine Learning , 2020, IEEE Transactions on Information Theory.

[6] Ravi Tandon,et al. Information Theoretic Limits of Data Shuffling for Distributed Learning , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[7] Richard Cole,et al. Edge-Coloring Bipartite Multigraphs in O(E logD) Time , 1999, Comb..

[8] A. Salman Avestimehr,et al. The Exact Rate-Memory Tradeoff for Caching With Uncoded Prefetching , 2016, IEEE Transactions on Information Theory.

[9] P. Hall. On Representatives of Subsets , 1935 .

[10] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11] A. Salman Avestimehr,et al. A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[12] Ravi Tandon,et al. Near Optimal Coded Data Shuffling for Distributed Learning , 2018, IEEE Transactions on Information Theory.

[13] Tie-Yan Liu,et al. Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling , 2017, Neurocomputing.

[14] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[15] Ravi Tandon,et al. On the worst-case communication overhead for distributed data shuffling , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[17] Christina Fragouli,et al. A pliable index coding approach to data shuffling , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[18] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.

[19] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20] Mohammad Ali Maddah-Ali,et al. Coded MapReduce , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21] Suvrit Sra,et al. Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.

[22] Urs Niesen,et al. Fundamental limits of caching , 2012, 2013 IEEE International Symposium on Information Theory.

[23] Jichan Chung. UberShuffle: Communication-efficient Data Shuffling for SGD via Coding Theory , 2017 .

[24] Ashish Goel,et al. Perfect Matchings in O(nlog n) Time in Regular Bipartite Graphs , 2013, SIAM J. Comput..

[25] Steven Hand,et al. CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[26] Kannan Ramchandran,et al. Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[27] Ashish Goel,et al. Perfect matchings in o(n log n) time in regular bipartite graphs , 2009, STOC '10.

[28] Mohammad Ali Maddah-Ali,et al. A Unified Coding Framework for Distributed Computing with Straggling Servers , 2016, 2016 IEEE Globecom Workshops (GC Wkshps).