Continual Learning for Named Entity Recognition

Named Entity Recognition (NER) is a vital task in various NLP applications. However, in many real-world scenarios (e.g., voice-enabled assistants) new named entity types are frequently introduced, entailing re-training NER models to support these new entity types. Re-annotating the original training data for the new entity types could be costly or even impossible when storage limitations or security concerns restrict access to that data, and annotating a new dataset for all of the entities becomes impractical and error-prone as the number of types increases. To tackle this problem, we introduce a novel Continual Learning approach for NER, which requires new training material to be annotated only for the new entity types. To preserve the existing knowledge previously learned by the model, we exploit the Knowledge Distillation (KD) framework, where the existing NER model acts as the teacher for a new NER model (i.e., the student), which learns the new entity types by using the new training material and retains knowledge of old entities by imitating the teacher’s outputs on this new training set. Our experiments show that this approach allows the student model to “progressively” learn to identify new entity types without forgetting the previously learned ones. We also present a comparison with multiple strong baselines to demonstrate that our approach is superior for continually updating an NER model.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[3]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[4]  Matthias De Lange,et al.  Continual learning: A comparative study on how to defy forgetting in classification tasks , 2019, ArXiv.

[5]  Alessandro Moschitti,et al.  Transfer Learning for Sequence Labeling Using Source Model and Target Data , 2019, AAAI.

[6]  Andrew McCallum,et al.  Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets , 2018, EMNLP.

[7]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Roberto Basili,et al.  Effective Kernelized Online Learning in Language Processing Tasks , 2014, ECIR.

[11]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[12]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[13]  Bing Liu,et al.  Overcoming Catastrophic Forgetting for Continual Learning via Model Adaptation , 2018, ICLR.

[14]  Quang-Thuy Ha,et al.  A Character-Level Deep Lifelong Learning Model for Named Entity Recognition in Vietnamese Text , 2019, ACIIDS.

[15]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[16]  Cordelia Schmid,et al.  Incremental Learning of Object Detectors without Catastrophic Forgetting , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[18]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[19]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[20]  Tianlin Liu,et al.  Continual Learning for Sentence Representations Using Conceptors , 2019, NAACL.

[21]  Matthew B. Blaschko,et al.  Encoder Based Lifelong Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).