Pretrained Models for Multilingual Federated Learning

Since the advent of Federated Learning (FL), research has applied these methods to natural language processing (NLP) tasks. Despite a plethora of papers in FL for NLP, no previous works have studied how multilingual text impacts FL algorithms. Furthermore, multilingual text provides an interesting avenue to examine the impact of non-IID text (e.g. different languages) on FL in naturally occurring data. We explore three multilingual language tasks, language modeling, machine translation, and text classification using differing federated and non-federated learning algorithms. Our results show that using pretrained models reduces the negative effects of FL, helping them to perform near or better than centralized (no privacy) learning, even when using non-IID partitioning.

[1]  Tatsunori B. Hashimoto,et al.  Large Language Models Can Be Strong Differentially Private Learners , 2021, ICLR.

[2]  Chuhan Wu,et al.  Communication-efficient federated learning via knowledge distillation , 2021, Nature Communications.

[3]  Longxiang Gao,et al.  Federated Learning Meets Natural Language Processing: A Survey , 2021, ArXiv.

[4]  Jun Zhao,et al.  FedED: Federated Learning via Ensemble Distillation for Medical Relation Extraction , 2020, EMNLP.

[5]  Holger Schwenk,et al.  Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[6]  Daniel J. Beutel,et al.  Flower: A Friendly Federated Learning Research Framework , 2020, 2007.14390.

[7]  Xing Wu,et al.  FedMed: A Federated Learning Framework for Language Modeling , 2020, Sensors.

[8]  Maksym Andriushchenko,et al.  On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ICLR.

[9]  Tom B. Brown,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[10]  Joel Stremmel,et al.  Pretraining Federated Text Models for Next Word Prediction , 2020, Advances in Intelligent Systems and Computing.

[11]  Asa Cooper Stickland,et al.  Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation , 2020, EACL.

[12]  Xiaodong Fan,et al.  XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[13]  Tim Miller,et al.  Federated pretraining and fine tuning of BERT using clinical notes from multiple silos , 2020, ArXiv.

[14]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[15]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[16]  Toan Q. Nguyen,et al.  Transformers without Tears: Improving the Normalization of Self-Attention , 2019, IWSLT.

[17]  Teven Le Scao,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[18]  Cyril Allauzen,et al.  Federated Learning of N-Gram Language Models , 2019, CoNLL.

[19]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[20]  Yonatan Belinkov,et al.  Findings of the First Shared Task on Machine Translation Robustness , 2019, WMT.

[21]  Swaroop Ramaswamy,et al.  Federated Learning for Emoji Prediction in a Mobile Keyboard , 2019, ArXiv.

[22]  Tom Ouyang,et al.  Federated Learning Of Out-Of-Vocabulary Words , 2019, ArXiv.

[23]  Zi Huang,et al.  Learning Private Neural Language Modeling with Attentive Aggregation , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[24]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[25]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[26]  Graham Neubig,et al.  MTNT: A Testbed for Machine Translation of Noisy Text , 2018, EMNLP.

[27]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[28]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[29]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[32]  Marcin Junczys-Dowmunt,et al.  The United Nations Parallel Corpus v1.0 , 2016, LREC.

[33]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[36]  Xiang Ren,et al.  FedNLP: A Research Platform for Federated Learning in Natural Language Processing , 2021, ArXiv.

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.