Aggregate or Not? Exploring Where to Privatize in DNN Based Federated Learning Under Different Non-IID Scenes

Although federated learning (FL) has recently been proposed for efficient distributed training and data privacy protection, it still encounters many obstacles. One of these is the naturally existing statistical heterogeneity among clients, making local data distributions non independently and identically distributed (i.e., non-iid), which poses challenges for model aggregation and personalization. For FL with a deep neural network (DNN), privatizing some layers is a simple yet effective solution for non-iid problems. However, which layers should we privatize to facilitate the learning process? Do different categories of non-iid scenes have preferred privatization ways? Can we automatically learn the most appropriate privatization way during FL? In this paper, we answer these questions via abundant experimental studies on several FL benchmarks. First, we present the detailed statistics of these benchmarks and categorize them into covariate and label shift non-iid scenes. Then, we investigate both coarse-grained and fine-grained network splits and explore whether the preferred privatization ways have any potential relations to the specific category of a non-iid scene. Our findings are exciting, e.g., privatizing the base layers could boost the performances even in label shift non-iid scenes, which are inconsistent with some natural conjectures. We also find that none of these privatization ways could improve the performances on the Shakespeare benchmark, and we guess that Shakespeare may not be a seriously non-iid scene. Finally, we propose several approaches to automatically learn where to aggregate via cross-stitch, soft attention, and hard selection. We advocate the proposed methods could serve as a preliminary try to explore where to privatize for a novel non-iid scene.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Sunav Choudhary,et al.  Federated Learning with Personalization Layers , 2019, ArXiv.

[6]  Zhi-Hua Zhou,et al.  Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin , 2019, ICML.

[7]  Leonidas J. Guibas,et al.  An Information-Theoretic Approach to Transferability in Task Transfer Learning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[8]  Hong Ren Wu,et al.  On the Transferability of Representations in Neural Networks Between Datasets and Tasks , 2018, NIPS 2018.

[9]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[10]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[11]  Xin Yao,et al.  Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating , 2019, ArXiv.

[12]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[13]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[14]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[18]  Ananda Theertha Suresh,et al.  FedBoost: A Communication-Efficient Algorithm for Federated Learning , 2020, ICML.

[19]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[20]  Mehdi Bennis,et al.  Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[21]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[22]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[23]  H. Vincent Poor,et al.  Federated Learning With Differential Privacy: Algorithms and Performance Analysis , 2019, IEEE Transactions on Information Forensics and Security.

[24]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[25]  Xiao Wang,et al.  Towards Class Imbalance in Federated Learning , 2020, ArXiv.

[26]  Virendra J. Marathe,et al.  Private Federated Learning with Domain Adaptation , 2019, ArXiv.

[27]  Milind Kulkarni,et al.  Survey of Personalization Techniques for Federated Learning , 2020, 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4).

[28]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zhenguo Li,et al.  Federated Meta-Learning for Recommendation , 2018, ArXiv.

[30]  Hubert Eichner,et al.  Federated Evaluation of On-device Personalization , 2019, ArXiv.

[31]  Tal Hassner,et al.  Transferability and Hardness of Supervised Classification Tasks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Jinwoo Shin,et al.  Learning What and Where to Transfer , 2019, ICML.

[33]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[34]  Nadav Israel,et al.  Overcoming Forgetting in Federated Learning on Non-IID Data , 2019, ArXiv.

[35]  Jun Zhao,et al.  FedED: Federated Learning via Ensemble Distillation for Medical Relation Extraction , 2020, EMNLP.

[36]  Ruslan Salakhutdinov,et al.  Think Locally, Act Globally: Federated Learning with Local and Global Representations , 2020, ArXiv.

[37]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[38]  Lawrence Carin,et al.  FLOP: Federated Learning on Medical Datasets using Partial Networks , 2021, KDD.

[39]  Kang G. Shin,et al.  Federated User Representation Learning , 2019, ArXiv.

[40]  Tal Hassner,et al.  LEEP: A New Measure to Evaluate Transferability of Learned Representations , 2020, ICML.

[41]  Cyril Allauzen,et al.  Federated Learning of N-Gram Language Models , 2019, CoNLL.

[42]  Swaroop Ramaswamy,et al.  Federated Learning for Emoji Prediction in a Mobile Keyboard , 2019, ArXiv.

[43]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[44]  Lifeng Sun,et al.  Two-Stream Federated Learning: Reduce the Communication Costs , 2018, 2018 IEEE Visual Communications and Image Processing (VCIP).

[45]  Junpu Wang,et al.  FedMD: Heterogenous Federated Learning via Model Distillation , 2019, ArXiv.

[46]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[47]  Yi Shi,et al.  Deep multiple instance selection , 2021, Sci. China Inf. Sci..

[48]  Bohyung Han,et al.  Domain-Specific Batch Normalization for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[50]  Tianjian Chen,et al.  A Secure Federated Transfer Learning Framework , 2020, IEEE Intelligent Systems.

[51]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[52]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[53]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[54]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[55]  Sebastian U. Stich,et al.  Ensemble Distillation for Robust Model Fusion in Federated Learning , 2020, NeurIPS.

[56]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[57]  Phillip B. Gibbons,et al.  The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.

[58]  Vitaly Shmatikov,et al.  Salvaging Federated Learning by Local Adaptation , 2020, ArXiv.

[59]  Michael Crawshaw,et al.  Multi-Task Learning with Deep Neural Networks: A Survey , 2020, ArXiv.

[60]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Subhransu Maji,et al.  Exploring and Predicting Transferability across NLP Tasks , 2020, EMNLP.

[62]  Lifeng Sun,et al.  Towards Faster and Better Federated Learning: A Feature Fusion Approach , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[63]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[64]  Aaron Q. Li,et al.  Parameter Server for Distributed Machine Learning , 2013 .