AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning

Federated learning enables a cluster of decentralized mobile devices at the edge to collaboratively train a shared machine learning model, while keeping all the raw training samples on device. This decentralized training approach is demonstrated as a practical solution to mitigate the risk of privacy leakage. However, enabling efficient FL deployment at the edge is challenging because of non-IID training data distribution, wide system heterogeneity and stochastic-varying runtime effects in the field. This paper jointly optimizes time-to-convergence and energy efficiency of state-of-the-art FL use cases by taking into account the stochastic nature of edge execution. We propose AutoFL by tailor-designing a reinforcement learning algorithm that learns and determines which K participant devices and per-device execution targets for each FL model aggregation round in the presence of stochastic runtime variance, system and data heterogeneity. By considering the unique characteristics of FL edge deployment judiciously, AutoFL achieves 3.6 times faster model convergence time and 4.7 and 5.2 times higher energy efficiency for local clients and globally over the cluster of K participants, respectively.

[1]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[2]  Masahiro Morikura,et al.  Lottery Hypothesis based Unsupervised Pre-training for Model Compression in Federated Learning , 2020, 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall).

[3]  Shuai Zhang,et al.  MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions , 2021 .

[4]  Partha Pratim Pande,et al.  An Energy-aware Online Learning Framework for Resource Management in Heterogeneous Platforms , 2020, ACM Trans. Design Autom. Electr. Syst..

[5]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Canh Dinh,et al.  Federated Learning Over Wireless Networks: Convergence Analysis and Resource Allocation , 2019, IEEE/ACM Transactions on Networking.

[8]  Hojung Cha,et al.  Optimizing Energy Efficiency of Browsers in Energy-Aware Scheduling-enabled Mobile Devices , 2019, MobiCom.

[9]  Jörg Henkel,et al.  Machine Learning for Power, Energy, and Thermal Management on Multicore Processors: A Survey , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[11]  Li Quan,et al.  A Novel Two-Layered Reinforcement Learning for Task Offloading with Tradeoff between Physical Machine Utilization Rate and Delay , 2018, Future Internet.

[12]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[13]  Giancarlo Fortino,et al.  Task Offloading and Resource Allocation for Mobile Edge Computing by Deep Reinforcement Learning Based on SARSA , 2020, IEEE Access.

[14]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Lawrence Carin,et al.  WAFFLe: Weight Anonymized Factorization for Federated Learning , 2020, IEEE Access.

[16]  Yitao Chen,et al.  Exploring the capabilities of mobile devices in supporting deep learning , 2019, SEC.

[17]  Olatunji Ruwase,et al.  Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems , 2015, KDD.

[18]  Gennady Pekhimenko,et al.  Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[19]  Hyeonggyu Kim,et al.  MofySim: A mobile full-system simulation framework for energy consumption and performance analysis , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20]  Anuj Pathania,et al.  Neural Network Inference on Mobile SoCs , 2020, IEEE Design & Test.

[21]  David Cortes,et al.  Adapting multi-armed bandits policies to contextual bandits scenarios , 2018, ArXiv.

[22]  Qinghua Liu,et al.  Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.

[23]  Tulika Mitra,et al.  Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC , 2018, ACM Trans. Embed. Comput. Syst..

[24]  Sheng-Chun Kao,et al.  ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[26]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[27]  Song Guo,et al.  Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[28]  Cody Coleman,et al.  MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[29]  Liang Liang,et al.  Self-Balancing Federated Learning With Global Imbalanced Data in Mobile Systems , 2021, IEEE Transactions on Parallel and Distributed Systems.

[30]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[31]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[32]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[33]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[35]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[36]  Ning Ding,et al.  GfxDoctor: A Holistic Graphics Energy Profiler for Mobile Devices , 2017, EuroSys.

[37]  Kun Yang,et al.  Task Offloading with Power Control for Mobile Edge Computing Using Reinforcement Learning-Based Markov Decision Process , 2020, Mob. Inf. Syst..

[38]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[39]  Carole-Jean Wu,et al.  Optimizing User Satisfaction of Mobile Workloads Subject to Various Sources of Uncertainties , 2019, IEEE Transactions on Mobile Computing.

[40]  Sebastian U. Stich,et al.  Ensemble Distillation for Robust Model Fusion in Federated Learning , 2020, NeurIPS.

[41]  Ning Ding,et al.  Characterizing and modeling the impact of wireless signal strength on smartphone battery drain , 2013, SIGMETRICS '13.

[42]  Carole-Jean Wu,et al.  A study of mobile device utilization , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[43]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[44]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[45]  Lei Yang,et al.  Accurate online power estimation and automatic battery behavior based power model generation for smartphones , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[46]  Yan Zhang,et al.  Differentially Private Asynchronous Federated Learning for Mobile Edge Computing in Urban Informatics , 2020, IEEE Transactions on Industrial Informatics.

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48]  Massoud Pedram,et al.  JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services , 2018, IEEE Transactions on Mobile Computing.

[49]  Christian Beecks,et al.  Complexity-Adaptive Gaussian Process Model Inference for Large-Scale Data , 2021, SDM.

[50]  Christopher W. Fletcher,et al.  SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD Processors , 2019, PACT.

[51]  Yiming Yang,et al.  MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.

[52]  Yanchao Zhao,et al.  FedMEC: Improving Efficiency of Differentially Private Federated Learning via Mobile Edge Computing , 2020, Mobile Networks and Applications.

[53]  Sagar Naik,et al.  Energy Cost Models of Smartphones for Task Offloading to the Cloud , 2015, IEEE Transactions on Emerging Topics in Computing.

[54]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[55]  Diana Marculescu,et al.  NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks , 2017, ArXiv.

[56]  Alec Wolman,et al.  MAUI: making smartphones last longer with code offload , 2010, MobiSys '10.

[57]  Young Geun Kim,et al.  Signal strength-aware adaptive offloading for energy efficient mobile devices , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[58]  Patrick MacAlpine,et al.  Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark , 2021, NeurIPS.

[59]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[60]  Lixin Fan,et al.  Federated Learning: Privacy and Incentive , 2020, Federated Learning.

[61]  Lixin Zhang,et al.  Moby: A mobile benchmark suite for architectural simulators , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[62]  Carole-Jean Wu,et al.  DORA: Optimizing Smartphone Energy Efficiency and Web Browser Performance under Interference , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[63]  Jian Liang,et al.  No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data , 2021, NeurIPS.

[64]  Yi Zhou,et al.  Towards Taming the Resource and Data Heterogeneity in Federated Learning , 2019, OpML.

[65]  Yitao Chen,et al.  Exploring the Capabilities of Mobile Devices Supporting Deep Learning , 2018, HPDC.

[66]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[67]  Hongbin Zhu,et al.  Federated Learning with Class Imbalance Reduction , 2020, 2021 29th European Signal Processing Conference (EUSIPCO).

[68]  Ying-Chang Liang,et al.  Federated Learning in Mobile Edge Networks: A Comprehensive Survey , 2020, IEEE Communications Surveys & Tutorials.

[69]  David Patterson,et al.  MLPerf Training Benchmark , 2019, MLSys.

[70]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[71]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[72]  Carole-Jean Wu,et al.  Exploiting Parallelism Opportunities with Deep Learning Frameworks , 2019, ACM Trans. Archit. Code Optim..

[73]  Carole-Jean Wu,et al.  Improving smartphone user experience by balancing performance and energy with probabilistic QoS guarantee , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[74]  Paul M. Carpenter,et al.  Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[75]  A. S. Xanthopoulos,et al.  Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems , 2008, Appl. Math. Comput..

[76]  Izzatul Umami,et al.  Comparing Epsilon Greedy and Thompson Sampling model for Multi-Armed Bandit algorithm on Marketing Dataset , 2021, Journal of Applied Data Sciences.

[77]  Orhan Firat,et al.  GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.

[78]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[79]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[80]  Young Geun Kim,et al.  Signal Strength-Aware Adaptive Offloading with Local Image Preprocessing for Energy Efficient Mobile Devices , 2020, IEEE Transactions on Computers.

[81]  Deze Zeng,et al.  Dependency-Aware Computation Offloading in Mobile Edge Computing: A Reinforcement Learning Approach , 2019, IEEE Access.

[82]  Dhabaleswar K. Panda,et al.  An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures , 2017, MLHPC@SC.

[83]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Bingsheng He,et al.  Federated Learning on Non-IID Data Silos: An Experimental Study , 2021, 2022 IEEE 38th International Conference on Data Engineering (ICDE).

[85]  Vijay Janapa Reddi,et al.  High-performance and energy-efficient mobile web browsing on big/little systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[86]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[87]  Young Geun Kim,et al.  Enhancing Energy Efficiency of Multimedia Applications in Heterogeneous Mobile Multi-Core Processors , 2017, IEEE Transactions on Computers.

[88]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[89]  Carole-Jean Wu,et al.  Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[90]  Olatunji Ruwase,et al.  ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep learning , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.

[91]  TT-REC: TENSOR TRAIN COMPRESSION FOR DEEP LEARNING RECOMMENDATION MODEL EMBEDDINGS , 2021 .

[92]  Robert M. Patton,et al.  FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks , 2020, MLSys.

[93]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[94]  Jingwei Sun,et al.  LotteryFL: Personalized and Communication-Efficient Federated Learning with Lottery Ticket Hypothesis on Non-IID Datasets , 2020, ArXiv.

[95]  Yusheng Ji,et al.  Learning-Based Offloading of Tasks with Diverse Delay Sensitivities for Mobile Edge Computing , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[96]  Gu-Yeon Wei,et al.  A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms , 2020, MLSys.

[97]  Carole-Jean Wu,et al.  Quantifying the energy cost of data movement for emerging smart phone workloads on mobile platforms , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[98]  Carole-Jean Wu,et al.  Characterization and Throttling-Based Mitigation of Memory Interference for Heterogeneous Smartphones , 2015, 2015 IEEE International Symposium on Workload Characterization.

[99]  Li Li,et al.  Exploring federated learning on battery-powered devices , 2019, ACM TUR-C.

[100]  Yuanyuan Yang,et al.  Towards Efficient Scheduling of Federated Mobile Devices Under Computational and Statistical Heterogeneity , 2021, IEEE Transactions on Parallel and Distributed Systems.

[101]  Ying Jun Zhang,et al.  Deep Reinforcement Learning for Online Computation Offloading in Wireless Powered Mobile-Edge Computing Networks , 2018, IEEE Transactions on Mobile Computing.

[102]  Huzefa Rangwala,et al.  Asynchronous Online Federated Learning for Edge Devices with Non-IID Data , 2019, 2020 IEEE International Conference on Big Data (Big Data).

[103]  Bohn Stafleu van Loghum Google translate , 2017 .

[104]  Qun Li,et al.  eSGD: Communication Efficient Distributed Deep Learning on the Edge , 2018, HotEdge.

[105]  Woongki Baek,et al.  MOSAIC: Heterogeneity-, Communication-, and Constraint-Aware Model Slicing and Execution for Accurate and Efficient Inference , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[106]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[107]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[108]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[109]  Improving Semi-supervised Federated Learning by Reducing the Gradient Diversity of Models , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[110]  Marten van Dijk,et al.  Asynchronous Federated Learning with Reduced Number of Rounds and with Differential Privacy from Less Aggregated Gaussian Noise , 2020, ArXiv.

[111]  Bor-Yiing Su,et al.  Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems , 2020, ArXiv.

[112]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[113]  Mohammad Alian,et al.  A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[114]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[115]  Carole-Jean Wu,et al.  Understanding Training Efficiency of Deep Learning Recommendation Models at Scale , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[116]  Carole-Jean Wu,et al.  Characterization and dynamic mitigation of intra-application cache interference , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[117]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[118]  Carole-Jean Wu,et al.  The Vision Behind MLPerf: Understanding AI Inference Performance , 2021, IEEE Micro.

[119]  Wei Shi,et al.  Federated learning of predictive models from federated Electronic Health Records , 2018, Int. J. Medical Informatics.

[120]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[121]  Carole-Jean Wu,et al.  AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[122]  Margaret Martonosi,et al.  Run-time power estimation in high performance microprocessors , 2001, ISLPED '01.