Towards Fast and Unified Transfer Learning Architectures for Sequence Labeling

Sequence labeling systems have advanced continuously using neural architectures over the past several years. However, these tasks require large sets of annotated data to achieve such performance. In particular, we focus on the Named Entity Recognition (NER) task on clinical notes, which is one of the most fundamental and critical problems for medical text analysis. Our work centers on effectively adapting these neural architectures towards low-resource settings using parameter transfer methods. We complement a standard hierarchical NER model with a general transfer learning framework, the Tunable Transfer Network (TTN) consisting of parameter sharing between the source and target tasks, and showcase scores significantly above the baseline architecture. Our best TTN model achieves 2-5% improvement over pre-trained language model BERT as well as its multi task extension MT-DNN in low resource settings. However, our proposed sharing scheme requires an exponential search over tied parameter sets to generate an optimal configuration. To mitigate the problem of exhaustively searching for model optimization, we propose the Dynamic Transfer Networks (DTN), a gated architecture which learns the appropriate parameter sharing scheme between source and target datasets. DTN achieves the improvements of the optimized transfer learning framework with just a single training setting, effectively removing the need for an exponential search.

[1]  Ruslan Salakhutdinov,et al.  Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[4]  Nanyun Peng,et al.  Multi-task Domain Adaptation for Sequence Tagging , 2016, Rep4NLP@ACL.

[5]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[6]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[7]  Pengtao Xie,et al.  Effective Use of Bidirectional Language Modeling for Medical Named Entity Recognition , 2017, ArXiv.

[8]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[9]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[10]  Ramakanth Pasunuru,et al.  Dynamic Multi-Level Multi-Task Learning for Sentence Simplification , 2018, COLING.

[11]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[12]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[13]  Ken Chen,et al.  Label-Aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition , 2018, NAACL.

[14]  Parminder Bhatia,et al.  Relation Extraction using Explicit Context Conditioning , 2019, NAACL.

[15]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[16]  Dan Klein,et al.  Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks , 2016, NAACL.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[19]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[20]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[23]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[24]  Busra Celikkaya,et al.  Joint Entity Extraction and Assertion Detection for Clinical Text , 2018, ACL.

[25]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[26]  Busra Celikkaya,et al.  Dynamic Transfer Learning for Named Entity Recognition , 2018, Precision Health and Medicine.

[27]  Ramakanth Pasunuru,et al.  Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation , 2018, ACL.

[28]  Busra Celikkaya,et al.  Improving Hospital Mortality Prediction with Medical Named Entities and Multimodal Learning , 2018, ArXiv.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[31]  Parminder Bhatia,et al.  Morphological Priors for Probabilistic Neural Word Embeddings , 2016, EMNLP.

[32]  Isabelle Augenstein,et al.  Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces , 2018, NAACL.

[33]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[37]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[38]  Xing Fan,et al.  Transfer Learning for Neural Semantic Parsing , 2017, Rep4NLP@ACL.

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.