Federated Learning with Matched Averaging

Federated learning allows edge devices to collaboratively learn a shared model while keeping the training data on device, decoupling the ability to do model training from the need to store the data in the cloud. We propose Federated matched averaging (FedMA) algorithm designed for federated learning of modern neural network architectures e.g. convolutional neural networks (CNNs) and LSTMs. FedMA constructs the shared global model in a layer-wise manner by matching and averaging hidden elements (i.e. channels for convolution layers; hidden states for LSTM; neurons for fully connected layers) with similar feature extraction signatures. Our experiments indicate that FedMA outperforms popular state-of-the-art federated learning algorithms on deep CNN and LSTM architectures trained on real world datasets, while improving the communication efficiency.

[1]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[2]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[3]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Martin Jaggi,et al.  Model Fusion via Optimal Transport , 2019, NeurIPS.

[6]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[7]  Kristjan H. Greenewald,et al.  Statistical Model Aggregation via Parameter Matching , 2019, NeurIPS.

[8]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[9]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[10]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[11]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[12]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[13]  Anit Kumar Sahu,et al.  On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.

[14]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Gabriel Peyré,et al.  Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[17]  M. Gromov Metric Structures for Riemannian and Non-Riemannian Spaces , 1999 .

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  D. Sculley,et al.  No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World , 2017, 1711.08536.

[21]  Abdel Nasser,et al.  A Survey of the Quadratic Assignment Problem , 2014 .

[22]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[23]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[24]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[25]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Paraschos Koutris,et al.  Scalable inference of topic evolution via models for latent geometric structures , 2018, NeurIPS.