论文信息 - Optimal Client Sampling for Federated Learning

Optimal Client Sampling for Federated Learning

It is well understood that client-master communication can be a primary bottleneck in Federated Learning. In this work, we address this issue with a novel client subsampling scheme, where we restrict the number of clients allowed to communicate their updates back to the master node. In each communication round, all participated clients compute their updates, but only the ones with "important" updates communicate back to the master. We show that importance can be measured using only the norm of the update and we give a formula for optimal client participation. This formula minimizes the distance between the full update, where all clients participate, and our limited update, where the number of participating clients is restricted. In addition, we provide a simple algorithm that approximates the optimal formula for client participation which only requires secure aggregation and thus does not compromise client privacy. We show both theoretically and empirically that our approach leads to superior performance for Distributed SGD (DSGD) and Federated Averaging (FedAvg) compared to the baseline where participating clients are sampled uniformly. Finally, our approach is orthogonal to and compatible with existing methods for reducing communication overhead, such as local methods and communication compression methods.

Wenlin Chen | Peter Richtarik | Samuel Horvath

[1] Lin Xiao,et al. An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[2] François Fleuret,et al. Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.

[3] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[4] Filip Hanzely,et al. Federated Learning of a Mixture of Global and Local Models , 2020, ArXiv.

[5] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[6] W. M. Goodall. Television by pulse code modulation , 1951 .

[7] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.

[8] Peter Richtárik,et al. Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[9] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.

[10] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.

[11] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.