Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Federated Learning (FL) is a decentralized machine-learning paradigm, in which a global server iteratively averages the model parameters of local users without accessing their data. User heterogeneity has imposed significant challenges to FL, which can incur drifted global models that are slow to converge. Knowledge Distillation has recently emerged to tackle this issue, by refining the server model using aggregated knowledge from heterogeneous users, other than directly averaging their model parameters. This approach, however, depends on a proxy dataset, making it impractical unless such prerequisite is satisfied. Moreover, the ensemble knowledge is not fully utilized to guide local model learning, which may in turn affect the quality of the aggregated model. Inspired by prior art, we propose a data-free knowledge distillation approach to address heterogeneous FL, where the server learns a lightweight generator to ensemble user information in a data-free manner, which is then broadcasted to users, regulating local training using the learned knowledge as an inductive bias. Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Murali Annavaram,et al.  Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge , 2020, NeurIPS.

[3]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[4]  Thad Starner,et al.  Data-Free Knowledge Distillation for Deep Neural Networks , 2017, ArXiv.

[5]  Anil K. Jain,et al.  FedFace: Collaborative Learning of Face Recognition Model , 2021, 2021 IEEE International Joint Conference on Biometrics (IJCB).

[6]  Eunho Yang,et al.  FedMix: Approximation of Mixup under Mean Augmented Federated Learning , 2021, ICLR.

[7]  Amos Storkey,et al.  Zero-shot Knowledge Transfer via Adversarial Belief Matching , 2019, NeurIPS.

[8]  Kate Saenko,et al.  Federated Adversarial Domain Adaptation , 2020, ICLR.

[9]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[10]  Spyridon Bakas,et al.  Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data , 2020, Scientific Reports.

[11]  Kuan Eeik Tan,et al.  Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System , 2019, ArXiv.

[12]  Tzu-Ming Harry Hsu,et al.  Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.

[13]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[14]  Mehdi Bennis,et al.  Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[15]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[16]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[17]  Nicholas Kushmerick,et al.  Ensembles of biased classifiers , 2005, ICML.

[18]  Siwei Ma,et al.  Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[21]  Ameet Talwalkar,et al.  One-Shot Federated Learning , 2019, ArXiv.

[22]  U Kang,et al.  Knowledge Extraction with No Observable Data , 2019, NeurIPS.

[23]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[24]  Sebastian U. Stich,et al.  Ensemble Distillation for Robust Model Fusion in Federated Learning , 2020, NeurIPS.

[25]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[26]  Nguyen H. Tran,et al.  Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[27]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[28]  Lingjuan Lyu,et al.  Federated Model Distillation with Noise-Free Differential Privacy , 2021, IJCAI.

[29]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[30]  Seong-Lyun Kim,et al.  Federated Knowledge Distillation , 2020, ArXiv.

[31]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[32]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[33]  Alexei A. Efros,et al.  Dataset Distillation , 2018, ArXiv.

[34]  Hong-You Chen,et al.  FedDistill: Making Bayesian Model Ensemble Applicable to Federated Learning , 2020, ArXiv.

[35]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[36]  Yasaman Khazaeni,et al.  Federated Learning with Matched Averaging , 2020, ICLR.

[37]  Jiayu Zhou,et al.  Federated Learning's Blessing: FedAvg has Linear Speedup , 2020, ArXiv.

[38]  Wojciech Samek,et al.  FedAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Junpu Wang,et al.  FedMD: Heterogenous Federated Learning via Model Distillation , 2019, ArXiv.

[40]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[41]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[44]  Xiao-Tong Yuan,et al.  On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond , 2019, J. Mach. Learn. Res..

[45]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[46]  Hong-You Chen,et al.  FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning , 2020, ICLR.

[47]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[48]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[49]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[50]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[51]  Y. Mansour,et al.  Three Approaches for Personalization with Applications to Federated Learning , 2020, ArXiv.

[52]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[53]  Aryan Mokhtari,et al.  Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach , 2020, NeurIPS.

[54]  Kartik Sreenivasan,et al.  Attack of the Tails: Yes, You Really Can Backdoor Federated Learning , 2020, NeurIPS.

[55]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.