Cross-lingual Continual Learning

The longstanding goal of multi-lingual learning has been to develop a universal cross-lingual model that can withstand the changes in multi-lingual data distributions. There has been a large amount of work to adapt such multi-lingual models to unseen target languages. However, the majority of work in this direction focuses on the standard one-hop transfer learning pipeline from source to target languages, whereas in realistic scenarios, new languages can be incorporated at any time in a sequential manner. In this paper, we present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm, where we analyze different categories of approaches used to continually adapt to emerging data from different languages. We provide insights into what makes multilingual sequential learning particularly challenging. To surmount such challenges, we benchmark a representative set of cross-lingual continual learning algorithms and analyze their knowledge preservation, accumulation, and generalization capabilities compared to baselines on carefully curated datastreams. The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata, which go beyond conventional transfer learning.

[1]  T. Tuytelaars,et al.  Three types of incremental learning , 2022, Nat. Mac. Intell..

[2]  Xi Victoria Lin,et al.  Lifting the Curse of Multilinguality by Pre-training Modular Transformers , 2022, NAACL.

[3]  Navid Rekabsaz,et al.  WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models , 2021, NAACL.

[4]  Andrew O. Arnold,et al.  Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora , 2021, BIGSCIENCE.

[5]  Xiang Ren,et al.  X-METRA-ADA: Cross-lingual Meta-Transfer learning Adaptation to Natural Language Understanding and Question Answering , 2021, NAACL.

[6]  Orhan Firat,et al.  Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution , 2021, NAACL.

[7]  Iryna Gurevych,et al.  UNKs Everywhere: Adapting Multilingual Language Models to New Scripts , 2020, EMNLP.

[8]  Bing Liu,et al.  Continual Learning in Task-Oriented Dialogue Systems , 2020, EMNLP.

[9]  Magdalena Biesialska,et al.  Continual Lifelong Learning in Natural Language Processing: A Survey , 2020, COLING.

[10]  Benoit Sagot,et al.  When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models , 2020, NAACL.

[11]  Haoran Li,et al.  MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark , 2020, EACL.

[12]  Iryna Gurevych,et al.  AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[13]  Iryna Gurevych,et al.  MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[14]  Zhiyuan Liu,et al.  More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction , 2020, AACL.

[15]  Monojit Choudhury,et al.  The State and Fate of Linguistic Diversity and Inclusion in the NLP World , 2020, ACL.

[16]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[17]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[18]  Holger Schwenk,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[19]  Gustavo Aguilar,et al.  Knowledge Distillation from Internal Representations , 2019, AAAI Conference on Artificial Intelligence.

[20]  Hung-yi Lee,et al.  LAMOL: LAnguage MOdeling for Lifelong Language Learning , 2019, ICLR.

[21]  Jason Baldridge,et al.  PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[22]  Giuseppe Castellucci,et al.  Multi-lingual Intent Detection and Slot Filling in a Joint BERT-based Model , 2019, ArXiv.

[23]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[24]  Thomas Wolf,et al.  Transfer Learning in Natural Language Processing , 2019, NAACL.

[25]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[26]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[27]  David Barber,et al.  Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting , 2018, NeurIPS.

[28]  Avijit Hazra,et al.  Using the confidence interval confidently. , 2017, Journal of thoracic disease.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[31]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[32]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[33]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Martial Mermillod,et al.  The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects , 2013, Front. Psychol..

[37]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[38]  Robert M. French,et al.  Catastrophic Interference in Connectionist Networks: Can It Be Predicted, Can It Be Prevented? , 1993, NIPS.

[39]  Genta Indra Winata,et al.  Preserving Cross-Linguality of Pre-trained Models via Continual Learning , 2021, REPL4NLP.

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.