论文信息 - Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space

Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space

This paper proposes a fully decentralized federated learning (FL) scheme for Internet of Everything (IoE) devices that are connected via multi-hop networks. Because FL algorithms hardly converge the parameters of machine learning (ML) models, this paper focuses on the convergence of ML models in function spaces. Considering that the representative loss functions of ML tasks e.g, mean squared error (MSE) and Kullback-Leibler (KL) divergence, are convex functionals, algorithms that directly update functions in function spaces could converge to the optimal solution. The key concept of this paper is to tailor a consensusbased optimization algorithm to work in the function space and achieve the global optimum in a distributed manner. This paper first analyzes the convergence of the proposed algorithm in a function space, which is referred to as a meta-algorithm, and shows that the spectral graph theory can be applied to the function space in a manner similar to that of numerical vectors. Then, consensus-based multi-hop federated distillation (CMFD) is developed for a neural network (NN) to implement the meta-algorithm. CMFD leverages knowledge distillation to realize function aggregation among adjacent devices without parameter averaging. An advantage of CMFD is that it works even with different NN models among the distributed learners. Although CMFD does not perfectly reflect the behavior of the metaalgorithm, the discussion of the meta-algorithm’s convergence property promotes an intuitive understanding of CMFD, and simulation evaluations show that NN models converge using CMFD for several tasks. The simulation results also show that CMFD achieves higher accuracy than parameter aggregation for weakly connected networks, and CMFD is more stable than parameter aggregation methods.

Akihito Taya | Masahiro Morikura | Takayuki Nishio | Koji Yamamoto

[1] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[2] Mehdi Bennis,et al. Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[3] Vladimir Vapnik,et al. Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[4] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[5] M. Fiedler. Algebraic connectivity of graphs , 1973 .

[6] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[7] Seong-Lyun Kim,et al. Mix2FLD: Downlink Federated Learning After Uplink Federated Distillation With Two-Way Mixup , 2020, IEEE Communications Letters.

[8] Daisuke Sugimura,et al. Network-Density-Controlled Decentralized Parallel Stochastic Gradient Descent in Wireless Systems , 2020, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[9] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10] Mehdi Bennis,et al. Wireless Network Intelligence at the Edge , 2018, Proceedings of the IEEE.

[11] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[12] Stephen P. Boyd,et al. Subgradient Methods , 2007 .

[13] L. Ambrosio,et al. Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[14] Seong-Lyun Kim,et al. Multi-hop Federated Private Data Augmentation with Sample Compression , 2019, ArXiv.

[15] Lixiang Li,et al. Complex networks-based energy-efficient evolution model for wireless sensor networks , 2009 .

[16] Klaus-Robert Müller,et al. Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[17] Xiang Li,et al. On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[18] Albert,et al. Emergence of scaling in random networks , 1999, Science.

[19] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[20] Geoffrey E. Hinton,et al. Large scale distributed neural network training through online distillation , 2018, ICLR.

[21] Joonhyuk Kang,et al. Wireless Federated Distillation for Distributed Edge Learning with Heterogeneous Data , 2019, 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).

[22] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[24] Peter Harremoës,et al. Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[25] Mikael Johansson,et al. A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..

[26] Stéphan Clémençon,et al. Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions , 2016, ICML.

[27] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[28] Yue Zhao,et al. Federated Learning with Non-IID Data , 2018, ArXiv.

[29] Tara Javidi,et al. Peer-to-peer Federated Learning on Graphs , 2019, ArXiv.

[30] Rich Caruana,et al. Model compression , 2006, KDD '06.

[31] Masahiro Morikura,et al. Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data , 2020, ArXiv.

[32] Amir Houmansadr,et al. Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer , 2019, ArXiv.

[33] Monica Nicoli,et al. Federated Learning With Cooperating Devices: A Consensus Approach for Massive IoT Networks , 2019, IEEE Internet of Things Journal.

[34] Han Cha,et al. Distilling On-Device Intelligence at the Network Edge , 2019, ArXiv.

[35] Mehdi Bennis,et al. GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning , 2019, J. Mach. Learn. Res..