Model-sharing Games: Analyzing Federated Learning Under Voluntary Participation

Federated learning is a setting where agents, each with access to their own data source, combine models learned from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fundamental question: should they join the global model or stay with their local model? In this work, we show how this situation can be naturally analyzed through the framework of coalitional game theory. Motivated by these considerations, we propose the following game: there are heterogeneous players with different model parameters governing their data distribution and different amounts of data they have noisily drawn from their own distribution. Each player's goal is to obtain a model with minimal expected mean squared error (MSE) on their own distribution. They have a choice of fitting a model based solely on their own data, or combining their learned parameters with those of some subset of the other players. Combining models reduces the variance component of their error through access to more data, but increases the bias because of the heterogeneity of distributions. In this work, we derive exact expected MSE values for problems in linear regression and mean estimation. We use these values to analyze the resulting game in the framework of hedonic game theory; we study how players might divide into coalitions, where each set of players within a coalition jointly constructs a single model. In a case with arbitrarily many players that each have either a "small" or "large" amount of data, we constructively show that there always exists a stable partition of players into coalitions.

[1]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[2]  Wojciech Samek,et al.  Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Ariel D. Procaccia,et al.  Collaborative PAC Learning , 2017, NIPS.

[4]  Vitaly Shmatikov,et al.  Salvaging Federated Learning by Local Adaptation , 2020, ArXiv.

[5]  Matthew O. Jackson,et al.  The Stability of Hedonic Coalition Structures , 2002, Games Econ. Behav..

[6]  Y. Mansour,et al.  Three Approaches for Personalization with Applications to Federated Learning , 2020, ArXiv.

[7]  Mehrdad Mahdavi,et al.  Adaptive Personalized Federated Learning , 2020, ArXiv.

[8]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[9]  A. Feder Cooper,et al.  Emergent Unfairness: Normative Assumptions and Contradictions in Algorithmic Fairness-Accuracy Trade-Off Research , 2021, ArXiv.

[10]  Avrim Blum,et al.  One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning , 2021, ICML.

[11]  A. Heavens,et al.  Parameter inference with estimated covariance matrices , 2015, 1511.05969.

[12]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[13]  A. Feder Cooper,et al.  Where Is the Normative Proof? Assumptions and Contradictions in ML Fairness Research , 2020, ArXiv.

[14]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[15]  Matteo Sereno,et al.  A Game-Theoretic Approach to Coalition Formation in Green Cloud Federations , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[16]  Mario Díaz,et al.  To Split or not to Split: The Impact of Disparate Treatment in Classification , 2020, IEEE Transactions on Information Theory.

[17]  G. Casella Illustrating empirical Bayes methods , 1992 .

[18]  Guillermo Sapiro,et al.  Minimax Pareto Fairness: A Multi Objective Perspective , 2020, ICML.

[19]  Liang Liang,et al.  Self-Balancing Federated Learning With Global Imbalanced Data in Mobile Systems , 2021, IEEE Transactions on Parallel and Distributed Systems.

[20]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[21]  Yonina C. Eldar,et al.  The Communication-Aware Clustered Federated Learning Problem , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[22]  B. Efron,et al.  Stein's Paradox in Statistics , 1977 .