Fed-ensemble: Improving Generalization through Model Ensembling in Federated Learning

In this paper we propose Fed-ensemble: a simple approach that brings model ensembling to federated learning (FL). Instead of aggregating local models to update a single global model, Fedensemble uses random permutations to update a group of K models and then obtains predictions through model averaging. Fed-ensemble can be readily utilized within established FL methods and does not impose a computational overhead as it only requires one of the K models to be sent to a client in each communication round. Theoretically, we show that predictions on new data from all K models belong to the same predictive posterior distribution under a neural tangent kernel regime. This result in turn sheds light on the generalization advantages of model averaging. We also illustrate that Fed-ensemble has an elegant Bayesian interpretation. Empirical results show that our model has superior performance over several FL algorithms, on a wide range of data sets, and excels in heterogeneous settings often encountered in FL applications.

[1]  Xiaorui Liu,et al.  A Double Residual Compression Algorithm for Efficient Distributed Learning , 2019, AISTATS.

[2]  Martin J. Wainwright,et al.  FedSplit: An algorithmic framework for fast federated optimization , 2020, NeurIPS.

[3]  Tengyu Ma,et al.  Federated Accelerated Stochastic Gradient Descent , 2020, NeurIPS.

[4]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[5]  Manzil Zaheer,et al.  Adaptive Federated Optimization , 2020, ICLR.

[6]  Andrew Gordon Wilson,et al.  Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.

[7]  Balaji Lakshminarayanan,et al.  Deep Ensembles: A Loss Landscape Perspective , 2019, ArXiv.

[8]  Hubert Eichner,et al.  Federated Evaluation of On-device Personalization , 2019, ArXiv.

[9]  Yasaman Khazaeni,et al.  Federated Learning with Matched Averaging , 2020, ICLR.

[10]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[11]  Alex Beatson,et al.  Amortized Bayesian Meta-Learning , 2018, ICLR.

[12]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[13]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[14]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[15]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[17]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[18]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[19]  Jordi Pont-Tuset,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[20]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[21]  Sreeram Kannan,et al.  Improving Federated Learning Personalization via Model Agnostic Meta Learning , 2019, ArXiv.

[22]  Baihe Huang,et al.  FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Convergence Analysis , 2021, ArXiv.

[23]  Venkatesh Saligrama,et al.  Federated Learning Based on Dynamic Regularization , 2021, ICLR.

[24]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[25]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[26]  Arthur Jacot,et al.  The asymptotic spectrum of the Hessian of DNN throughout training , 2020, ICLR.

[27]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[28]  Wotao Yin,et al.  FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data , 2020, ArXiv.

[29]  Takayuki Nishio,et al.  Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[30]  A. G. D. G. Matthews,et al.  Sample-then-optimize posterior sampling for Bayesian linear models , 2017 .

[31]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[32]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[33]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[35]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[36]  Hong-You Chen,et al.  FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning , 2020, ICLR.

[37]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[38]  Mosharaf Chowdhury,et al.  FedScale: Benchmarking Model and System Performance of Federated Learning , 2021, ResilientFL.

[39]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Peter Richtárik,et al.  Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.

[41]  Andrey Malinin,et al.  Uncertainty in Gradient Boosting via Ensembles , 2021, ICLR.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).