Federated Learning with Quantization Constraints

Traditional deep learning models are trained on centralized servers using labeled sample data collected from edge devices. This data often includes private information, which the users may not be willing to share. Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data. In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model. A major challenge that arises in this method is the need of each user to efficiently transmit its learned model over the throughput limited uplink channel. In this work, we tackle this challenge using tools from quantization theory. In particular, we identify the unique characteristics associated with conveying trained models over rate-constrained channels, and characterize a suitable quantization scheme for such setups. We show that combining universal vector quantization methods with FL yields a decentralized training system, which is both efficient and feasible. We also derive theoretical performance guarantees of the system. Our numerical results illustrate the substantial performance gains of our scheme over FL with previously proposed quantization approaches.

[1]  Jacob Ziv,et al.  On universal quantization , 1985, IEEE Trans. Inf. Theory.

[2]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[3]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[4]  Bruno Sericola,et al.  Distributed deep learning on edge-devices: Feasibility via adaptive compression , 2017, 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA).

[5]  T. C. Aysal,et al.  Distributed Average Consensus With Dithered Quantization , 2008, IEEE Transactions on Signal Processing.

[6]  Yonina C. Eldar,et al.  Task-Based Quantization for Recovering Quadratic Functions Using Principal Inertia Components , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[7]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[8]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[9]  N. J. A. Sloane,et al.  Voronoi regions of lattices, second moments of polytopes, and quantization , 1982, IEEE Trans. Inf. Theory.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  F. Alajaji,et al.  Lectures Notes in Information Theory , 2000 .

[14]  Xukan Ran,et al.  Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.

[15]  Walid Saad,et al.  A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks , 2021, IEEE Transactions on Wireless Communications.

[16]  H. Vincent Poor,et al.  Scheduling Policies for Federated Learning in Wireless Networks , 2019, IEEE Transactions on Communications.

[17]  Meir Feder,et al.  On universal quantization by randomized uniform/lattice quantizers , 1992, IEEE Trans. Inf. Theory.

[18]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[19]  Thomas Eriksson,et al.  Optimization of Lattices for Quantization , 1998, IEEE Trans. Inf. Theory.

[20]  Yonina C. Eldar,et al.  Hardware-Limited Task-Based Quantization , 2018, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[21]  Yonina C. Eldar,et al.  Asymptotic Task-Based Quantization With Application to Massive MIMO , 2018, IEEE Transactions on Signal Processing.

[22]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[23]  Kenneth Heafield,et al.  Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.

[24]  Aaron D. Wyner,et al.  The rate-distortion function for source coding with side information at the decoder , 1976, IEEE Trans. Inf. Theory.

[25]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[26]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[27]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[28]  Meir Feder,et al.  On lattice quantization noise , 1996, IEEE Trans. Inf. Theory.