Ternary Compression for Communication-Efficient Federated Learning

Learning over massive data stored in different locations is essential in many real-world applications. However, sharing data is full of challenges due to the increasing demands of privacy and security with the growing use of smart mobile devices and Internet of thing (IoT) devices. Federated learning provides a potential solution to privacy-preserving and secure machine learning, by means of jointly training a global model without uploading data distributed on multiple devices to a central server. However, most existing work on federated learning adopts machine learning models with full-precision weights, and almost all these models contain a large number of redundant parameters that do not need to be transmitted to the server, consuming an excessive amount of communication costs. To address this issue, we propose a federated trained ternary quantization (FTTQ) algorithm, which optimizes the quantized networks on the clients through a self-learning quantization factor. Theoretical proofs of the convergence of quantization factors, unbiasedness of FTTQ, as well as a reduced weight divergence are given. On the basis of FTTQ, we propose a ternary federated averaging protocol (T-FedAvg) to reduce the upstream and downstream communication of federated learning systems. Empirical experiments are conducted to train widely used deep learning models on publicly available data sets, and our results demonstrate that the proposed T-FedAvg is effective in reducing communication costs and can even achieve slightly better performance on non-IID data in contrast to the canonical federated learning algorithms.

[1]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[2]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[6]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[7]  Yaochu Jin,et al.  Multi-Objective Evolutionary Federated Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[9]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[10]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[11]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[12]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[13]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[14]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[15]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[16]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[17]  Xiaoyan Sun,et al.  Communication-Efficient Federated Deep Learning With Layerwise Asynchronous Model Update and Temporally Weighted Aggregation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[19]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Kenneth Heafield,et al.  Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.

[22]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[23]  Paul Voigt,et al.  The EU General Data Protection Regulation (GDPR) , 2017 .

[24]  Ying-Chang Liang,et al.  Federated Learning in Mobile Edge Networks: A Comprehensive Survey , 2020, IEEE Communications Surveys & Tutorials.

[25]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[26]  A. Bowman,et al.  Applied smoothing techniques for data analysis : the kernel approach with S-plus illustrations , 1999 .

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[29]  Bingsheng He,et al.  A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection , 2019, IEEE Transactions on Knowledge and Data Engineering.

[30]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[31]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[35]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[36]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[37]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[38]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[39]  Rong Zheng,et al.  Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.