UVeQFed: Universal Vector Quantization for Federated Learning

Traditional deep learning models are trained at a centralized server using data samples collected from users. Such data samples often include private information, which the users may not be willing to share. Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their data. FL consists of an iterative procedure, where in each iteration the users train a copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model. A major challenge that arises in this method is the need of each user to repeatedly transmit its learned model over the throughput limited uplink channel. In this work, we tackle this challenge using tools from quantization theory. In particular, we identify the unique characteristics associated with conveying trained models over rate-constrained channels, and propose a suitable quantization scheme for such settings, referred to as universal vector quantization for FL (UVeQFed). We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion. We then theoretically analyze the distortion, showing that it vanishes as the number of users grows. We also characterize how models trained with conventional federated averaging combined with UVeQFed converge to the model which minimizes the loss function. Our numerical results demonstrate the gains of UVeQFed over previously proposed methods in terms of both distortion induced in quantization and accuracy of the resulting aggregated model.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[3]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[4]  Meir Feder,et al.  On universal quantization by randomized uniform/lattice quantizers , 1992, IEEE Trans. Inf. Theory.

[5]  Jacob Ziv,et al.  On universal quantization , 1985, IEEE Trans. Inf. Theory.

[6]  Martin Jaggi,et al.  A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.

[7]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[8]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[9]  R. Gray,et al.  Dithered Quantizers , 1993, Proceedings. 1991 IEEE International Symposium on Information Theory.

[10]  K. Ferentios On Tcebycheff's type inequalities , 1982 .

[11]  John Vanderkooy,et al.  Quantization and Dither: A Theoretical Survey , 1992 .

[12]  H. Vincent Poor,et al.  Scheduling Policies for Federated Learning in Wireless Networks , 2019, IEEE Transactions on Communications.

[13]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[14]  P. Vaidyanathan,et al.  Results on lattice vector quantization with dithering , 1996 .

[15]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[16]  Toby Berger,et al.  Multiterminal Source Coding with High Resolution , 1999, IEEE Trans. Inf. Theory.

[17]  Yonina C. Eldar,et al.  Federated Learning with Quantization Constraints , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  W. Fischer,et al.  Sphere Packings, Lattices and Groups , 1990 .

[19]  Meir Feder,et al.  On lattice quantization noise , 1996, IEEE Trans. Inf. Theory.

[20]  Yonina C. Eldar,et al.  Over-the-Air Federated Learning From Heterogeneous Data , 2020, IEEE Transactions on Signal Processing.

[21]  H. Vincent Poor,et al.  Update Aware Device Scheduling for Federated Learning at the Wireless Edge , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[22]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[23]  Yonina C. Eldar,et al.  Task-Based Quantization with Application to MIMO Receivers , 2020, Commun. Inf. Syst..

[24]  K. Ferentinos,et al.  On Tchebycheff's type inequalities , 1982 .

[25]  Bruno Sericola,et al.  Distributed deep learning on edge-devices: Feasibility via adaptive compression , 2017, 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA).

[26]  Walid Saad,et al.  A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks , 2021, IEEE Transactions on Wireless Communications.

[27]  Zhi Ding,et al.  Federated Learning via Over-the-Air Computation , 2018, IEEE Transactions on Wireless Communications.

[28]  Marco Canini,et al.  Natural Compression for Distributed Deep Learning , 2019, MSML.

[29]  Michelle Effros,et al.  A vector quantization approach to universal noiseless coding and quantization , 1996, IEEE Trans. Inf. Theory.

[30]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[31]  T. C. Aysal,et al.  Distributed Average Consensus With Dithered Quantization , 2008, IEEE Transactions on Signal Processing.

[32]  Yonina C. Eldar,et al.  Task-Based Quantization for Recovering Quadratic Functions Using Principal Inertia Components , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[33]  Shengli Xie,et al.  Incentive Mechanism for Reliable Federated Learning: A Joint Optimization Approach to Combining Reputation and Contract Theory , 2019, IEEE Internet of Things Journal.

[34]  Aaron D. Wyner,et al.  The rate-distortion function for source coding with side information at the decoder , 1976, IEEE Trans. Inf. Theory.

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Kenneth Heafield,et al.  Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.

[37]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[38]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[39]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[40]  Yonina C. Eldar,et al.  Asymptotic Task-Based Quantization With Application to Massive MIMO , 2018, IEEE Transactions on Signal Processing.

[41]  Xukan Ran,et al.  Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.

[42]  Mohsen Guizani,et al.  Reliable Federated Learning for Mobile Networks , 2019, IEEE Wireless Communications.

[43]  Yonina C. Eldar,et al.  The Communication-Aware Clustered Federated Learning Problem , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[44]  C. Shannon Coding Theorems for a Discrete Source With a Fidelity Criterion-Claude , 2009 .

[45]  R. Y. Rubinstein Generating random vectors uniformly distributed inside and on the surface of different regions , 1982 .

[46]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[47]  Ursula Challita,et al.  Artificial Neural Networks-Based Machine Learning for Wireless Networks: A Tutorial , 2017, IEEE Communications Surveys & Tutorials.

[48]  Wei Chen,et al.  The Roadmap to 6G: AI Empowered Wireless Networks , 2019, IEEE Communications Magazine.

[49]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[50]  Aryan Mokhtari,et al.  FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization , 2019, AISTATS.

[51]  Yonina C. Eldar,et al.  Hardware-Limited Task-Based Quantization , 2018, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[52]  Kamyar Azizzadenesheli,et al.  signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[53]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[54]  N. J. A. Sloane,et al.  Voronoi regions of lattices, second moments of polytopes, and quantization , 1982, IEEE Trans. Inf. Theory.

[55]  Sebastian U. Stich,et al.  Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.

[56]  F. Alajaji,et al.  Lectures Notes in Information Theory , 2000 .

[57]  Thomas Eriksson,et al.  Optimization of Lattices for Quantization , 1998, IEEE Trans. Inf. Theory.