Federated Functional Gradient Boosting

In this paper, we initiate a study of functional minimization in Federated Learning. First, in the semi-heterogeneous setting, when the marginal distributions of the feature vectors on client machines are identical, we develop the federated functional gradient boosting (FFGB) method that provably converges to the global minimum. Subsequently, we extend our results to the fully-heterogeneous setting (where marginal distributions of feature vectors may differ) by designing an efficient variant of FFGB called FFGB.C, with provable convergence to a neighborhood of the global minimum within a radius that depends on the total variation distances between the client feature distributions. For the special case of square loss, but still in the fully heterogeneous setting, we design the FFGB.L method that also enjoys provable convergence to a neighborhood of the global minimum but within a radius depending on the much tighter Wasserstein-1 distances. For both FFGB.C and FFGB.L, the radii of convergence shrink to zero as the feature distributions become more homogeneous. Finally, we conduct proof-of-concept experiments to demonstrate the benefits of our approach against natural baselines.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[3]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[4]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[5]  David P. Helmbold,et al.  Boosting Methods for Regression , 2002, Machine Learning.

[6]  Gunnar Rätsch,et al.  On the Convergence of Leveraging , 2001, NIPS.

[7]  Ananda Theertha Suresh,et al.  FedBoost: A Communication-Efficient Algorithm for Federated Learning , 2020, ICML.

[8]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[9]  Martin Jaggi,et al.  Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.

[10]  Tzu-Ming Harry Hsu,et al.  Federated Visual Classification with Real-World Data Distribution , 2020, ECCV.

[11]  F. Browder Nonlinear functional analysis , 1970 .

[12]  Martin Jaggi,et al.  Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning. , 2020, 2008.03606.

[13]  Razvan Pascanu,et al.  Sobolev Training for Neural Networks , 2017, NIPS.

[14]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[17]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[18]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[19]  J. Andrew Bagnell,et al.  Generalized Boosting Algorithms for Convex Optimization , 2011, ICML.

[20]  Rich Caruana,et al.  Model compression , 2006, KDD '06.