Towards Efficient Scheduling of Federated Mobile Devices Under Computational and Statistical Heterogeneity

Originated from distributed learning, federated learning enables privacy-preserved collaboration on a new abstracted level by sharing the model parameters only. While the current research mainly focuses on optimizing learning algorithms and minimizing communication overhead left by distributed learning, there is still a considerable gap when it comes to the real implementation on mobile devices. In this article, we start with an empirical experiment to demonstrate computation heterogeneity is a more pronounced bottleneck than communication on the current generation of battery-powered mobile devices, and the existing methods are haunted by mobile stragglers. Further, non-identically distributed data across the mobile users makes the selection of participants critical to the accuracy and convergence. To tackle the computational and statistical heterogeneity, we utilize data as a tuning knob and propose two efficient polynomial-time algorithms to schedule different workloads on various mobile devices, when data is identically or non-identically distributed. For identically distributed data, we combine partitioning and linear bottleneck assignment to achieve near-optimal training time without accuracy loss. For non-identically distributed data, we convert it into an average cost minimization problem and propose a greedy algorithm to find a reasonable balance between computation time and accuracy. We also establish an offline profiler to quantify the runtime behavior of different devices, which serves as the input to the scheduling algorithms. We conduct extensive experiments on a mobile testbed with two datasets and up to 20 devices. Compared with the common benchmarks, the proposed algorithms achieve 2-100× speedup epoch-wise, 2–7 percent accuracy gain and boost the convergence rate by more than 100 percent on CIFAR10.

[1]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[2]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[3]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[4]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[5]  Yaochu Jin,et al.  Multi-Objective Evolutionary Federated Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[7]  Yitao Chen,et al.  Exploring the Capabilities of Mobile Devices Supporting Deep Learning , 2018, HPDC.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Chuan Wu,et al.  Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.

[10]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[11]  Cong Wang,et al.  Optimize Scheduling of Federated Learning on Battery-powered Mobile Devices , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Li Li,et al.  Close the Gap between Deep Learning and Mobile Intelligence by Incorporating Training in the Loop , 2019, ACM Multimedia.

[14]  Wei Wang,et al.  CMFL: Mitigating Communication Overhead for Federated Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[15]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[16]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[17]  Marc-Antoine Weisser,et al.  Bin packing with fragmentable items: Presentation and approximations , 2015, Theor. Comput. Sci..

[18]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[19]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[20]  Nenghai Yu,et al.  Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.

[21]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[22]  Jung Ho Ahn,et al.  Restructuring Batch Normalization to Accelerate CNN Training , 2018, SysML.

[23]  Jun Zhang,et al.  Edge-Assisted Hierarchical Federated Learning with Non-IID Data , 2019, ArXiv.

[24]  Nicholas D. Lane,et al.  DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware , 2017, MobiSys.

[25]  Alexandros G. Dimakis,et al.  Gradient Coding , 2016, ArXiv.

[26]  Zongpeng Li,et al.  Online Job Scheduling in Distributed Machine Learning Clusters , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[27]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[28]  Xiao Zeng,et al.  MobileDeepPill: A Small-Footprint Mobile Deep Learning System for Recognizing Unconstrained Pill Images , 2017, MobiSys.

[29]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[30]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[31]  Mehdi Bennis,et al.  Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[32]  Jirí Sgall,et al.  Approximation Schemes for Scheduling on Uniformly Related and Identical Parallel Machines , 1999, ESA.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[35]  A. W. Neebe,et al.  Bottleneck generalized assignment problems , 1988 .

[36]  Christopher W. Fletcher,et al.  Game of Threads: Enabling Asynchronous Poisoning Attacks , 2020, ASPLOS.

[37]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[38]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[39]  Eranda C Ela,et al.  Assignment Problems , 1964, Comput. J..

[40]  Dimitris S. Papailiopoulos,et al.  Gradient Diversity: a Key Ingredient for Scalable Distributed Learning , 2017, AISTATS.

[41]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[42]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[43]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..