论文信息 - SmartPC: Hierarchical Pace Control in Real-Time Federated Learning System

SmartPC: Hierarchical Pace Control in Real-Time Federated Learning System

Federated Learning is a technique for learning AI models through the collaboration of a large number of resourceconstrained mobile devices, while preserving data privacy. Instead of aggregating the training data from devices, Federated Learning uses multiple rounds of parameter aggregation to train a model, wherein the participating devices are coordinated to incrementally update a shared model with their own parameters locally learned. To efficiently deploy Federated Learning system over mobile devices, several critical issues including realtimeliness and energy efficiency should be well addressed. This paper proposes SmartPC, a hierarchical online pace control framework for Federated Learning that balances the training time and model accuracy in an energy-efficient manner. SmartPC consists of two layers of pace control: global and local. Prior to every training round, the global controller first oversees the status (e.g., connectivity, availability, and energy/resource remained) of every participating device, then selects qualified devices and assigns them a well-estimated virtual deadline for task completion. Within such virtual deadline, a statistically significant proportion (e.g., 60%) of the devices are expected to complete one round of their local training and model updates, while the overall progress of multi-round training procedure is kept up adaptively. On each device, a local pace controller then dynamically adjusts device settings such as CPU frequency so that the learning task is able to meet the deadline with the least amount of energy consumption. We performed extensive experiments to evaluate SmartPC on both Android smartphones and simulation platforms using well-known datasets. The experiment results show that SmartPC reduces up to 32:8% energy consumption on mobile devices and achieves a speedup of 2.27 in training time without model accuracy degradation.

Jun Wang | Li Li | Haoyi Xiong | Cheng-Zhong Xu | Zhishan Guo

[1] Clayton Shepard,et al. LiveLab: measuring wireless networks and smartphone users in the field , 2011, SIGMETRICS Perform. Evaluation Rev..

[2] Hubert Eichner,et al. Towards Federated Learning at Scale: System Design , 2019, SysML.

[3] H. T. Kung,et al. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[4] Marco Scavuzzo,et al. Asynchronous Federated Learning for Geospatial Applications , 2018, DMLE/IOTSTREAMING@PKDD/ECML.

[5] Chuan Wu,et al. Deep Learning-based Job Placement in Distributed Machine Learning Clusters , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[6] Tim Kraska,et al. MLbase: A Distributed Machine-learning System , 2013, CIDR.

[7] Katsuhiko Ogata,et al. Modern Control Engineering , 1970 .

[8] Paolo Costa,et al. Optimizing Network Performance in Distributed Machine Learning , 2015, HotCloud.

[9] Amir Salman Avestimehr,et al. CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning , 2019, IEEE Journal on Selected Areas in Information Theory.

[10] Ameet Talwalkar,et al. Federated Multi-Task Learning , 2017, NIPS.

[11] Michael J. Freedman,et al. SLAQ: quality-driven scheduling for distributed machine learning , 2017, SoCC.

[12] Henry Hoffmann,et al. POET: a portable approach to minimizing energy under soft real-time constraints , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[13] Hamed Haddadi,et al. Efficient and Private Federated Learning using TEE , 2019 .

[14] Qin Yan,et al. Scene classification with improved AlexNet model , 2017, 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE).

[15] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[16] Anusha Lalitha,et al. Fully Decentralized Federated Learning , 2018 .

[17] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18] Birsen Yazici,et al. Deep learning for radar , 2017, 2017 IEEE Radar Conference (RadarConf).

[19] Sarvar Patel,et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[20] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[21] Kin K. Leung,et al. Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[22] Tie-Yan Liu,et al. Slim-DP: A Multi-Agent System for Communication-Efficient Distributed Deep Learning , 2018, AAMAS.

[23] Raul Castro Fernandez,et al. Ako: Decentralised Deep Learning with Partial Gradient Exchange , 2016, SoCC.

[24] Tong Yang,et al. SketchML: Accelerating Distributed Machine Learning with Data Sketches , 2018, SIGMOD Conference.