Federated Multi-Task Learning under a Mixture of Distributions

The increasing size of data generated by smartphones and IoT devices motivated the development of Federated Learning (FL), a framework for on-device collaborative training of machine learning models. First efforts in FL focused on learning a single global model with good average performance across clients, but the global model may be arbitrarily bad for a given client, due to the inherent heterogeneity of local data distributions. Federated multi-task learning (MTL) approaches can learn personalized models by formulating an opportune penalized optimization problem. The penalization term can capture complex relations among personalized models, but eschews clear statistical assumptions about local data distributions. In this work, we propose to study federated MTL under the flexible assumption that each local data distribution is a mixture of unknown underlying distributions. This assumption encompasses most of the existing personalized FL approaches and leads to federated EM-like algorithms for both client-server and fully decentralized settings. Moreover, it provides a principled way to serve personalized models to clients not seen at training time. The algorithms’ convergence is analyzed through a novel federated surrogate optimization framework, which can be of general interest. Experimental results on FL benchmarks show that our approach provides models with higher accuracy and fairness than state-of-the-art methods.

[1]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[2]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[3]  Qinghua Liu,et al.  Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.

[4]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[5]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[6]  Peter Richtárik,et al.  Federated Learning of a Mixture of Global and Local Models , 2020, ArXiv.

[7]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[8]  Hans Ulrich Simon,et al.  Unlabeled Data Does Provably Help , 2013, STACS.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Nguyen H. Tran,et al.  Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[11]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[12]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[13]  Aryan Mokhtari,et al.  Federated Learning with Compression: Unified Analysis and Sharp Guarantees , 2020, AISTATS.

[14]  Marc Tommasi,et al.  Decentralized Collaborative Learning of Personalized Models over Networks , 2016, AISTATS.

[15]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[16]  Y. Mansour,et al.  Three Approaches for Personalization with Applications to Federated Learning , 2020, ArXiv.

[17]  Nguyen H. Tran,et al.  FedU: A Unified Framework for Federated Multi-Task Learning with Laplacian Regularization , 2021, ArXiv.

[18]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[19]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[20]  Joachim M. Buhmann,et al.  Variational Federated Multi-Task Learning , 2019, ArXiv.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[23]  Yasaman Khazaeni,et al.  Federated Learning with Matched Averaging , 2020, ICLR.

[24]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[25]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[26]  Sébastien Marcel,et al.  Torchvision the machine-vision package of torch , 2010, ACM Multimedia.

[27]  G. Neglia,et al.  Throughput-Optimal Topology Design for Cross-Silo Federated Learning , 2020, NeurIPS.

[28]  Mastane Achab,et al.  Weighted Emprirical Risk Minimization: Transfer Learning based on Importance Sampling , 2020, ESANN.

[29]  Wei Zhang,et al.  Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.

[30]  Filip Hanzely,et al.  Lower Bounds and Optimal Algorithms for Personalized Federated Learning , 2020, NeurIPS.

[31]  Manzil Zaheer,et al.  Adaptive Federated Optimization , 2020, ICLR.

[32]  Khe Chai Sim,et al.  An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models , 2019, INTERSPEECH.

[33]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[34]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[35]  Wojciech Samek,et al.  Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[37]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[38]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[39]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[40]  Don Towsley,et al.  Decentralized gradient methods: does topology matter? , 2020, AISTATS.

[41]  Martin Jaggi,et al.  A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.

[42]  Aryan Mokhtari,et al.  Personalized Federated Learning: A Meta-Learning Approach , 2020, ArXiv.

[43]  Don Towsley,et al.  The Role of Network Topology for Distributed Machine Learning , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[44]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[45]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[46]  Mehrdad Mahdavi,et al.  Adaptive Personalized Federated Learning , 2020, ArXiv.

[47]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[48]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[49]  Marc Tommasi,et al.  Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs , 2019, AISTATS.

[50]  Shai Ben-David,et al.  When can unlabeled data improve the learning rate? , 2019, COLT.

[51]  Maria-Florina Balcan,et al.  Adaptive Gradient-Based Meta-Learning Methods , 2019, NeurIPS.

[52]  Michael I. Jordan Graphical Models , 2003 .

[53]  Sreeram Kannan,et al.  Improving Federated Learning Personalization via Model Agnostic Meta Learning , 2019, ArXiv.

[54]  Anit Kumar Sahu,et al.  On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.

[55]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[57]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[58]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[59]  K. Ramchandran,et al.  An Efficient Framework for Clustered Federated Learning , 2020, IEEE Transactions on Information Theory.

[60]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[61]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[62]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[63]  Rachid Guerraoui,et al.  Personalized and Private Peer-to-Peer Machine Learning , 2017, AISTATS.