论文信息 - Learning to Solve NLP Tasks in an Incremental Number of Languages

Learning to Solve NLP Tasks in an Incremental Number of Languages

In real scenarios, a multilingual model trained to solve NLP tasks on a set of languages can be required to support new languages over time. Unfortunately, the straightforward retraining on a dataset containing annotated examples for all the languages is both expensive and time-consuming, especially when the number of considered languages grows. Moreover, the original annotated material may no longer be available due to storage or business constraints. Re-training only with the new language data will inevitably result in Catastrophic Forgetting of previously acquired knowledge. We propose a Continual Learning strategy that updates a model to support new languages over time, while maintaining consistent results on previously learned languages. We define a Teacher-Student framework where the existing model “teaches” to a student model its knowledge about the languages it supports, while the student is also trained on a new language. We report an experimental evaluation in several tasks including Sentence Classification, Relational Learning and Sequence Labeling.

[1] Trevor Cohn,et al. Massively Multilingual Transfer for NER , 2019, ACL.

[2] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[3] Matthias De Lange,et al. Continual learning: A comparative study on how to defy forgetting in classification tasks , 2019, ArXiv.

[4] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.

[5] Jan Niehues,et al. Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[6] Alessandro Moschitti,et al. Transfer Learning for Sequence Labeling Using Source Model and Target Data , 2019, AAAI.

[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8] Giuseppe Castellucci,et al. Continual Learning for Named Entity Recognition , 2021, AAAI.

[9] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[10] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[11] Di He,et al. Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.

[12] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13] Raffaella Bernardi,et al. Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering , 2019, ACL.

[14] Noah A. Smith,et al. The Multilingual Amazon Reviews Corpus , 2020, EMNLP.

[15] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[16] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[17] Martial Hebert,et al. Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[18] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[20] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[21] Fan-Keng Sun,et al. LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning , 2019, ICLR 2020.

[22] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[23] Dianhai Yu,et al. Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[24] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[25] Jason Baldridge,et al. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[26] Magdalena Biesialska,et al. Continual Lifelong Learning in Natural Language Processing: A Survey , 2020, COLING.

[27] Matthew B. Blaschko,et al. Encoder Based Lifelong Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[29] Sebastian Riedel,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[30] Roberto Basili,et al. Effective Kernelized Online Learning in Language Processing Tasks , 2014, ECIR.

[31] Cordelia Schmid,et al. Incremental Learning of Object Detectors without Catastrophic Forgetting , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Arianna Bisazza,et al. Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations , 2019, EMNLP.

[33] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .