Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks

Pre-trained word embeddings are the primary method for transfer learning in several Natural Language Processing (NLP) tasks. Recent works have focused on using unsupervised techniques such as language modeling to obtain these embeddings. In contrast, this work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain specific knowledge. Experiments performed in cross-task, cross-domain and cross-lingual settings indicate that such supervised embeddings are helpful, especially in the low-resource setting, but the extent of gains is dependent on the nature of the task and domain. We make our code publicly available.

[1]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[2]  Luke S. Zettlemoyer,et al.  Deep Semantic Role Labeling: What Works and What’s Next , 2017, ACL.

[3]  Zornitsa Kozareva,et al.  Neural Skill Transfer from Supervised Language Tasks to Reading Comprehension , 2017, ArXiv.

[4]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[5]  Rui Yan,et al.  How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[6]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[7]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[8]  Danushka Bollegala,et al.  Frustratingly Easy Meta-Embedding - Computing Meta-Embeddings by Averaging Source Word Embeddings , 2018, NAACL-HLT.

[9]  Young-Bum Kim,et al.  Domain Attention with an Ensemble of Experts , 2017, ACL.

[10]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[11]  Dan Klein,et al.  A Minimal Span-Based Neural Constituency Parser , 2017, ACL.

[12]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[13]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[14]  Noah A. Smith,et al.  Polyglot Semantic Role Labeling , 2018, ACL.

[15]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[16]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[19]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[20]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[21]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.