Fundamental Limits of Decentralized Data Shuffling

Data shuffling of training data among different computing nodes (workers) has been identified as a core element to improve the statistical performance of modern large-scale machine learning algorithms. Data shuffling is often considered as one of the most significant bottlenecks in such systems due to the heavy communication load. Under a master-worker architecture (where a master has access to the entire dataset and only communication between the master and the workers is allowed) coding has been recently proved to considerably reduce the communication load. This work considers a different communication paradigm referred to as decentralized data shuffling, where workers are allowed to communicate with one another via a shared link. The decentralized data shuffling problem has two phases: workers communicate with each other during the data shuffling phase, and then workers update their stored content during the storage phase. The main challenge is to derive novel converse bounds and achievable schemes for decentralized data shuffling by considering the asymmetry of the workers’ storages (i.e., workers are constrained to store different files in their storages based on the problem setting), in order to characterize the fundamental limits of this problem. For the case of uncoded storage (i.e., each worker directly stores a subset of bits of the dataset), this paper proposes converse and achievable bounds (based on distributed interference alignment and distributed clique-covering strategies) that are within a factor of 3/2 of one another. The proposed schemes are also exactly optimal under the constraint of uncoded storage for either large storage size or at most four workers in the system.

[1]  Ieee Standards Board Token ring access method and physical layer specifications : fibre optic station attachment , 1994 .

[2]  A. Salman Avestimehr,et al.  The Exact Rate-Memory Tradeoff for Caching With Uncoded Prefetching , 2016, IEEE Transactions on Information Theory.

[3]  A. Salman Avestimehr,et al.  A Scalable Framework for Wireless Distributed Computing , 2016, IEEE/ACM Transactions on Networking.

[4]  Yucheng Liu,et al.  Capacity Theorems for Distributed Index Coding , 2018, IEEE Transactions on Information Theory.

[5]  Christina Fragouli,et al.  Distributed Computing Trade-offs with Random Connectivity , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[6]  Mary Wootters,et al.  Embedded Index Coding , 2019, 2019 IEEE Information Theory Workshop (ITW).

[7]  Lawrence Ong,et al.  Structural Characteristics of Two-Sender Index Coding , 2019, Entropy.

[8]  Giuseppe Caire,et al.  Fundamental Limits of Caching in Wireless D2D Networks , 2014, IEEE Transactions on Information Theory.

[9]  Rong-Rong Chen,et al.  Cascaded Coded Distributed Computing on Heterogeneous Networks , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[10]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[11]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[12]  Abbas El Gamal,et al.  Network Information Theory , 2021, 2021 IEEE 3rd International Conference on Advanced Trends in Information Theory (ATIT).

[13]  Soheil Mohajer,et al.  On the Fundamental Limits of Coded Data Shuffling for Distributed Machine Learning , 2020, IEEE Transactions on Information Theory.

[14]  Antonio Ortega,et al.  A Topology-aware Coding Framework for Distributed Graph Processing , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Amir Salman Avestimehr,et al.  Coded Computing for Distributed Graph Analytics , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[16]  Ravi Tandon,et al.  On the worst-case communication overhead for distributed data shuffling , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[17]  Ravi Tandon,et al.  Information Theoretic Limits of Data Shuffling for Distributed Learning , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[18]  Aditya Ramamoorthy,et al.  Leveraging Coding Techniques for Speeding up Distributed Computing , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[19]  Parastoo Sadeghi,et al.  Distributed index coding , 2016, 2016 IEEE Information Theory Workshop (ITW).

[20]  Sheng Yang,et al.  Storage, Computation, and Communication: A Fundamental Tradeoff in Distributed Computing , 2018, 2018 IEEE Information Theory Workshop (ITW).

[21]  Fouad A. Tobagi,et al.  Performance Analysis of Carrier Sense Multiple Access with Collision Detection , 1980, Comput. Networks.

[22]  Rong-Rong Chen,et al.  A New Combinatorial Design of Coded Distributed Computing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[23]  Daniela Tuninetti,et al.  On the optimality of uncoded cache placement , 2015, 2016 IEEE Information Theory Workshop (ITW).

[24]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[25]  Rong-Rong Chen,et al.  Coded Distributed Computing with Heterogeneous Function Assignments , 2019, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[26]  Soheil Mohajer,et al.  On the Fundamental Limits of Coded Data Shuffling , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[27]  Ravi Tandon,et al.  Near Optimal Coded Data Shuffling for Distributed Learning , 2018, IEEE Transactions on Information Theory.

[28]  Urs Niesen,et al.  Fundamental limits of caching , 2012, 2013 IEEE International Symposium on Information Theory.

[29]  Jichan Chung UberShuffle: Communication-efficient Data Shuffling for SGD via Coding Theory , 2017 .

[30]  Yitzhak Birk,et al.  Informed-source coding-on-demand (ISCOD) over broadcast channels , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[31]  Christina Fragouli,et al.  Communication vs distributed computation: An alternative trade-off curve , 2017, 2017 IEEE Information Theory Workshop (ITW).