Transfer Learning for Sequence Labeling Using Source Model and Target Data

In this paper, we propose an approach for transferring the knowledge of a neural model for sequence labeling, learned from the source domain, to a new model trained on a target domain, where new label categories appear. Our transfer learning (TL) techniques enable to adapt the source model using the target data and new categories, without accessing to the source data. Our solution consists in adding new neurons in the output layer of the target model and transferring parameters from the source model, which are then fine-tuned with the target data. Additionally, we propose a neural adapter to learn the difference between the source and the target label distribution, which provides additional important information to the target model. Our experiments on Named Entity Recognition show that (i) the learned knowledge in the source model can be effectively transferred when the target data contains new categories and (ii) our neural adapter further improves such transfer.

[1]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[2]  Young-Bum Kim,et al.  New Transfer Learning Techniques for Disparate Label Sets , 2015, ACL.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Ying He,et al.  Biological Entity Recognition with Conditional Random Fields , 2008, AMIA.

[5]  Dilek Z. Hakkani-Tür,et al.  Spoken language understanding , 2008, IEEE Signal Processing Magazine.

[6]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[7]  Rui Yan,et al.  How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[8]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[14]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[15]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[16]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[18]  Xavier Carreras,et al.  Learning a Perceptron-Based Named Entity Chunker via Online Recognition Feedback , 2003, CoNLL.

[19]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[20]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[21]  Franck Dernoncourt,et al.  NeuroNER: an easy-to-use program for named-entity recognition based on neural networks , 2017, EMNLP.

[22]  Antonio Toral,et al.  Evaluation of Natural Language Tools for Italian: EVALITA 2007 , 2008, LREC.

[23]  Timothy Baldwin,et al.  Named Entity Recognition for Novel Types by Transfer Learning , 2016, EMNLP.

[24]  Kathleen M. Carley,et al.  Conditional random fields for entity extraction and ontological text coding , 2008 .