Achieving Statistical Optimality of Federated Learning: Beyond Stationary Points

Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud. FedAvg and FedProx are two widely adopted algorithms. However, recent work raised concerns on these two methods: (1) their fixed points do not correspond to the stationary points of the original optimization problem, and (2) the common model found might not generalize well locally. In this paper, we alleviate these concerns. Towards this, we adopt the statistical learning perspective yet allow the distributions to be heterogeneous and the local data to be unbalanced. We show, in the general kernel regression setting, that both FedAvg and FedProx converge to the minimax-optimal error rates. Moreover, when the kernel function has a finite rank, the convergence is exponentially fast. Our results further analytically quantify the impact of the model heterogeneity and characterize the federation gain – the reduction of the estimation error for a worker to join the federated learning compared to the best local estimator. To the best of our knowledge, we are the first to show the achievability of minimax error rates under FedAvg and FedProx, and the first to characterize the gains in joining FL. Numerical experiments further corroborate our theoretical findings on the statistical optimality of FedAvg and FedProx and the federation gains.

[1]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[2]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[3]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[4]  Martin J. Wainwright,et al.  Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Jaouad Mourtada Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices , 2019 .

[6]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[7]  Martin J. Wainwright,et al.  FedSplit: An algorithmic framework for fast federated optimization , 2020, NeurIPS.

[8]  Nguyen H. Tran,et al.  Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[9]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[10]  Mehrdad Mahdavi,et al.  Adaptive Personalized Federated Learning , 2020, ArXiv.

[11]  Junshan Zhang,et al.  A Collaborative Learning Framework via Federated Meta-Learning , 2020, ArXiv.

[12]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[13]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[14]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[15]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[16]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[17]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[18]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[19]  Sreeram Kannan,et al.  Improving Federated Learning Personalization via Model Agnostic Meta Learning , 2019, ArXiv.

[20]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[21]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[22]  Fei Chen,et al.  Federated Meta-Learning with Fast Convergence and Efficient Communication , 2018 .

[23]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[24]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[25]  Aryan Mokhtari,et al.  Personalized Federated Learning: A Meta-Learning Approach , 2020, ArXiv.