Hierarchical Coded Gradient Aggregation for Learning at the Edge

Client devices at the edge are generating increasingly large amounts of rich data suitable for learning powerful statistical models. However, privacy concerns and heavy communication load make it infeasible to move the client data to a centralized location for training. In many distributed learning setups, client nodes carry out gradient computations on their local data while the central master server receives the local gradients and aggregates them to take the global model update step. To guarantee robustness against straggling communication links, we consider a hierarchical setup with ne clients and nh reliable helper nodes that are available to aid in gradient aggregation at the master. To achieve resiliency against straggling client-to-helpers links, we propose two approaches leveraging coded redundancy. First is the Aligned Repetition Coding (ARC) that repeats gradient components on the helper links, allowing significant partial aggregations at the helpers, resulting in a helpers-to-master communication load (CHM) of ${\mathcal{O}}\left( {{n_h}} \right)$. ARC however results in a client-to-helpers communication load (CEH) of Θ(nh), which is prohibitive for client nodes due to limited and costly bandwidth. We thus propose Aligned Minimum Distance Separable Coding (AMC) that achieves optimal CEH of Θ(1) for a given resiliency threshold by using MDS code over the gradient components, while achieving a CHM of ${\mathcal{O}}\left( {{n_e}} \right)$.

[1]  Fan Li,et al.  Distributed Computing with Heterogeneous Communication Constraints: The Worst-Case Computation Load and Proof by Contradiction , 2018, ArXiv.

[2]  Aryan Mokhtari,et al.  FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization , 2019, AISTATS.

[3]  Martin Raab,et al.  "Balls into Bins" - A Simple and Tight Analysis , 1998, RANDOM.

[4]  Nageen Himayat,et al.  Coded Federated Learning , 2019, 2019 IEEE Globecom Workshops (GC Wkshps).

[5]  Christina Fragouli,et al.  Distributed Computing Trade-offs with Random Connectivity , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[6]  Ramtin Pedarsani,et al.  Latency analysis of coded computation schemes over wireless networks , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Vijayalakshmi Saravanan,et al.  Role of Big Data in Internet of Things Networks , 2019, Advances in Data Mining and Database Management.

[8]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[9]  Amir Salman Avestimehr,et al.  Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[10]  A. Klinger THE VANDERMONDE MATRIX , 1967 .

[11]  A. Salman Avestimehr,et al.  A Fundamental Tradeoff Between Computation and Communication in Distributed Computing , 2016, IEEE Transactions on Information Theory.

[12]  Amir Salman Avestimehr,et al.  Slack squeeze coded computing for adaptive straggler mitigation , 2019, SC.

[13]  Jaekyun Moon,et al.  Hierarchical Coding for Distributed Computing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[14]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[15]  Amir Salman Avestimehr,et al.  Collage Inference: Tolerating Stragglers in Distributed Neural Network Inference using Coding , 2019, ArXiv.

[16]  A. Salman Avestimehr,et al.  Edge Computing in the Dark: Leveraging Contextual-Combinatorial Bandit and Coded Computing , 2019, ArXiv.

[17]  Nuwan S. Ferdinand,et al.  Hierarchical Coded Computation , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[18]  Nageen Himayat,et al.  Coded Computing for Distributed Machine Learning in Wireless Edge Network , 2019, 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall).

[19]  Mohammad Ali Maddah-Ali,et al.  CodedSketch: Coded Distributed Computation of Approximated Matrix Multiplication , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[20]  Sadia Din,et al.  Smart health monitoring and management system: Toward autonomous wearable sensing for Internet of Things using big data analytics , 2019, Future Gener. Comput. Syst..

[21]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[22]  Mahadev Satyanarayanan,et al.  The Emergence of Edge Computing , 2017, Computer.

[23]  Amir Salman Avestimehr,et al.  Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy , 2018, AISTATS.

[24]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[25]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[26]  Vipul Gupta,et al.  A sequential approximation framework for coded distributed optimization , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Petros Elia,et al.  Mapping Heterogeneity Does Not Affect Wireless Coded MapReduce , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[28]  Deniz Gündüz,et al.  Heterogeneous Computation across Heterogeneous Workers , 2019, ArXiv.

[29]  Yanjiao Chen,et al.  Privacy-Preserving Collaborative Deep Learning With Unreliable Participants , 2020, IEEE Transactions on Information Forensics and Security.

[30]  Aryan Mokhtari,et al.  Robust and Communication-Efficient Collaborative Learning , 2019, NeurIPS.

[31]  Aditya Ramamoorthy,et al.  CAMR: Coded Aggregated MapReduce , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[32]  A. Salman Avestimehr,et al.  Timely Coded Computing , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[33]  Jun Li,et al.  Dual Entangled Polynomial Code: Three-Dimensional Coding for Distributed Matrix Multiplication , 2019, ICML.

[34]  Amir Salman Avestimehr,et al.  Coded Computing for Distributed Graph Analytics , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[35]  Vinayak Ramkumar,et al.  Coded MapReduce Schemes Based on Placement Delivery Array , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[36]  Vipul Gupta,et al.  OverSketched Newton: Fast Convex Optimization for Serverless Systems , 2019, 2020 IEEE International Conference on Big Data (Big Data).

[37]  Amir Salman Avestimehr,et al.  CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning , 2019, ArXiv.

[38]  Ravi Tandon,et al.  Information Theoretic Limits of Data Shuffling for Distributed Learning , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[39]  Deniz Gündüz,et al.  Heterogeneous Coded Computation across Heterogeneous Workers , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).