Coding for Straggler Mitigation in Federated Learning

We present a novel coded federated learning (FL) scheme for linear regression that mitigates the effect of straggling devices while retaining the privacy level of conventional FL. The proposed scheme combines one-time padding to preserve privacy and gradient codes to yield resiliency against stragglers and consists of two phases. In the first phase, the devices share a one-time padded version of their local data with a subset of other devices. In the second phase, the devices and the central server collaboratively and iteratively train a global linear model using gradient codes on the one-time padded local data. To apply one-time padding to real data, our scheme exploits a fixedpoint arithmetic representation of the data. Unlike the coded FL scheme recently introduced by Prakash et al., the proposed scheme maintains the same level of privacy as conventional FL while achieving a similar training time. Compared to conventional FL, we show that the proposed scheme achieves a training speedup factor of 6.6 and 9.2 on the MNIST and Fashion-MNIST datasets for an accuracy of 95% and 85%, respectively.

[1]  Jakub Konecný,et al.  On the Outsized Importance of Learning Rates in Local Update Methods , 2020, ArXiv.

[2]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[3]  Qinghua Liu,et al.  Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.

[4]  George J. Pappas,et al.  Achieving Linear Convergence in Federated Learning under Objective and Systems Heterogeneity , 2021, ArXiv.

[5]  Octavian Catrina,et al.  Secure Computation with Fixed-Point Numbers , 2010, Financial Cryptography.

[6]  Suhas N. Diggavi,et al.  Straggler Mitigation in Distributed Optimization Through Data Encoding , 2017, NIPS.

[7]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[8]  Shusen Yang,et al.  Asynchronous Federated Learning with Differential Privacy for Edge Intelligence , 2019, ArXiv.

[9]  Amir Salman Avestimehr,et al.  Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[10]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[11]  Aryan Mokhtari,et al.  Straggler-Resilient Federated Learning: Leveraging the Interplay Between Statistical Accuracy and System Heterogeneity , 2020, IEEE Journal on Selected Areas in Information Theory.

[12]  Farzin Haddadpour,et al.  On the optimal recovery threshold of coded matrix multiplication , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[13]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[14]  Nageen Himayat,et al.  Coded Computing for Low-Latency Federated Learning Over Wireless Edge Networks , 2020, IEEE Journal on Selected Areas in Communications.

[15]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[16]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[17]  Alexandre Graell i Amat,et al.  Rateless Codes for Low-Latency Distributed Inference in Mobile Edge Computing , 2021, ArXiv.

[18]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[19]  Albin Severinson,et al.  Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers , 2017, IEEE Transactions on Communications.

[20]  Parijat Dube,et al.  Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.

[21]  Indranil Gupta,et al.  Asynchronous Federated Optimization , 2019, ArXiv.

[22]  Stephen A. Jarvis,et al.  SAFA: A Semi-Asynchronous Protocol for Fast Federated Learning With Low Overhead , 2019, IEEE Transactions on Computers.

[23]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[24]  Alexandre Graell i Amat,et al.  Private Edge Computing for Linear Inference Based on Secret Sharing , 2020, GLOBECOM 2020 - 2020 IEEE Global Communications Conference.

[25]  Osvaldo Simeone,et al.  On Model Coding for Distributed Inference and Transmission in Mobile Edge Computing Systems , 2019, IEEE Communications Letters.

[26]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[27]  Mohammad Ali Maddah-Ali,et al.  A Unified Coding Framework for Distributed Computing with Straggling Servers , 2016, 2016 IEEE Globecom Workshops (GC Wkshps).

[28]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.