Oort: Informed Participant Selection for Scalable Federated Learning

Federated Learning (FL) is an emerging direction in distributed machine learning (ML) that enables in-situ model training and testing on edge data. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. As a result, data characteristics and device capabilities vary widely across clients. Yet, existing efforts randomly select FL participants, which leads to poor model and system efficiency. In this paper, we propose Kuiper to improve the performance of federated training and testing with guided participant selection. With an aim to improve time-to-accuracy performance in model training, Kuiper prioritizes the use of those clients who have both data that offers the greatest utility in improving model accuracy and the capability to run training quickly. To enable FL developers to interpret their results in model testing, Kuiper enforces their requirements on the distribution of participant data while improving the duration of federated testing by cherry-picking clients. Our evaluation shows that, compared to existing participant selection mechanisms, Kuiper improves time-to-accuracy performance by 1.2x-14.1x and final model accuracy by 1.3%-9.8%, while efficiently enforcing developer requirements on data distributions at the scale of millions of clients.

[1]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Yang Liu,et al.  Real-World Image Datasets for Federated Learning , 2019, ArXiv.

[3]  Nikhil R. Devanur,et al.  PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.

[4]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[5]  AkellaAditya,et al.  Low Latency Geo-distributed Data Analytics , 2015 .

[6]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[7]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[8]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[9]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[10]  Siddharth Gopal,et al.  Adaptive Sampling for SGD by Exploiting Side Information , 2016, ICML.

[11]  Ilana Segall,et al.  Federated Learning for Ranking Browser History Suggestions , 2019, ArXiv.

[12]  Sanjiv Kumar,et al.  Federated Learning with Only Positive Labels , 2020, ICML.

[13]  Kate Saenko,et al.  Federated Adversarial Domain Adaptation , 2020, ICLR.

[14]  Manzil Zaheer,et al.  Adaptive Federated Optimization , 2020, ICLR.

[15]  Y. Mansour,et al.  Three Approaches for Personalization with Applications to Federated Learning , 2020, ArXiv.

[16]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[17]  Edward A. Lee,et al.  AWStream: adaptive wide-area streaming analytics , 2018, SIGCOMM.

[18]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[19]  Cyril Allauzen,et al.  Federated Learning of N-Gram Language Models , 2019, CoNLL.

[20]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[21]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[22]  Kush R. Varshney,et al.  Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing , 2020, ICML.

[23]  Hubert Eichner,et al.  Federated Evaluation of On-device Personalization , 2019, ArXiv.

[24]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[25]  Alexander Aiken,et al.  TASO: optimizing deep learning computation with automatic generation of graph substitutions , 2019, SOSP.

[26]  P. Bertail,et al.  Exponential bounds for multivariate self-normalized sums , 2008 .

[27]  Phillip B. Gibbons,et al.  The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.

[28]  Philip Levis,et al.  Learning in situ: a randomized experiment in video streaming , 2019, NSDI.

[29]  François Fleuret,et al.  Biased Importance Sampling for Deep Neural Network Training , 2017, ArXiv.

[30]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[31]  Vijay Chidambaram,et al.  The Seven Sins of Personal-Data Processing Systems under GDPR , 2019, HotCloud.

[32]  Jianyu Wang,et al.  Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies , 2020, ArXiv.

[33]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[34]  Andreas Haeberlen,et al.  Honeycrisp: large-scale differentially private aggregation without a trusted core , 2019, SOSP.

[35]  Kang G. Shin,et al.  Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.

[36]  Harsha V. Madhyastha,et al.  Sol: Fast Distributed Computation Over Slow Networks , 2020, NSDI.

[37]  W. J. DeCoursey,et al.  Introduction: Probability and Statistics , 2003 .

[38]  Carlo Curino,et al.  Global Analytics in the Face of Bandwidth and Regulatory Constraints , 2015, NSDI.

[39]  H. Brendan McMahan,et al.  Generative Models for Effective ML on Private, Decentralized Datasets , 2019, ICLR.

[40]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[41]  Tao Lin,et al.  Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.

[42]  Daguang Xu,et al.  Privacy-preserving Federated Brain Tumour Segmentation , 2019, MLMI@MICCAI.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jianyu Wang,et al.  Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD , 2018, MLSys.

[45]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[46]  Vyas Sekar,et al.  Via: Improving Internet Telephony Call Quality Using Predictive Relay Selection , 2016, SIGCOMM.

[47]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.

[48]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[49]  Mosharaf Chowdhury,et al.  FedScale: Benchmarking Model and System Performance of Federated Learning , 2021, ResilientFL.

[50]  Dan Boneh,et al.  Prio: Private, Robust, and Scalable Computation of Aggregate Statistics , 2017, NSDI.

[51]  Andrew A. Chien,et al.  Networked Cameras Are the New Big Data Clusters , 2019, HotEdgeVideo@MOBICOM.

[52]  Harsha V. Madhyastha,et al.  Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed Storage , 2020, NSDI.

[53]  Tom Ouyang,et al.  Federated Learning Of Out-Of-Vocabulary Words , 2019, ArXiv.

[54]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[55]  Vijay Chidambaram,et al.  Understanding and benchmarking the impact of GDPR on database systems , 2020, Proc. VLDB Endow..

[56]  H. Brendan McMahan,et al.  Training Production Language Models without Memorizing User Data , 2020, ArXiv.

[57]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[58]  Harsha V. Madhyastha,et al.  To Relay or Not to Relay for Inter-Cloud Transfers? , 2018, HotCloud.

[59]  Paramvir Bahl,et al.  Focus: Querying Large Video Datasets with Low Latency and Low Cost , 2018, OSDI.

[60]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Xuanzhe Liu,et al.  A First Look at Deep Learning Apps on Smartphones , 2018, WWW.

[62]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[63]  Neoklis Polyzotis,et al.  Data Validation for Machine Learning , 2019, SysML.

[64]  Aditya Akella,et al.  CLARINET: WAN-Aware Optimization for Analytics Queries , 2016, OSDI.

[65]  Tyler B. Johnson,et al.  Training Deep Models Faster with Robust, Approximate Importance Sampling , 2018, NeurIPS.

[66]  Amar Phanishayee,et al.  Themis: Fair and Efficient GPU Cluster Scheduling , 2020, NSDI.

[67]  François Fleuret,et al.  Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.

[68]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[69]  Yibo Zhu,et al.  A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.

[70]  Jinyuan Jia,et al.  Local Model Poisoning Attacks to Byzantine-Robust Federated Learning , 2019, USENIX Security Symposium.

[71]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[72]  Tzu-Ming Harry Hsu,et al.  Federated Visual Classification with Real-World Data Distribution , 2020, ECCV.

[73]  Ananda Theertha Suresh,et al.  Distributed Mean Estimation with Limited Communication , 2016, ICML.