Multi-Task Supervised Pretraining for Neural Domain Adaptation

Two prevalent transfer learning approaches are used in recent works to improve neural networks performance for domains with small amounts of annotated data: Multi-task learning which involves training the task of interest with related auxiliary tasks to exploit their underlying similarities, and Mono-task fine-tuning, where the weights of the model are initialized with the pretrained weights of a large-scale labeled source domain and then fine-tuned with labeled data of the target domain (domain of interest). In this paper, we propose a new approach which takes advantage from both approaches by learning a hierarchical model trained across multiple tasks from a source domain, and is then fine-tuned on multiple tasks of the target domain. Our experiments on four tasks applied to the social media domain show that our proposed approach leads to significant improvements on all tasks compared to both approaches.

[1]  Nasredine Semmar,et al.  Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging , 2019, NAACL-HLT.

[2]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[3]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4]  Chenliang Li,et al.  A Survey on Deep Learning for Named Entity Recognition , 2018, IEEE Transactions on Knowledge and Data Engineering.

[5]  Nasredine Semmar,et al.  Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks , 2016, COLING.

[6]  Hexiang Hu,et al.  Multi-Task Learning for Sequence Tagging: An Empirical Study , 2018, COLING.

[7]  Sebastian Ruder,et al.  Neural transfer learning for natural language processing , 2019 .

[8]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[9]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[10]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[11]  Yijia Liu,et al.  Parsing Tweets into Universal Dependencies , 2018, NAACL.

[12]  Timothy Dozat,et al.  Universal Dependency Parsing from Scratch , 2019, CoNLL.

[13]  Xuanjing Huang,et al.  Part-of-Speech Tagging for Twitter with Adversarial Neural Networks , 2017, EMNLP.

[14]  Philippe Langlais,et al.  SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling , 2019, NAACL.

[15]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[16]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  Yonatan Belinkov,et al.  What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.

[19]  Orhan Firat,et al.  Adaptive Scheduling for Multi-Task Learning , 2019, ArXiv.

[20]  Shubhanshu Mishra,et al.  Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets , 2019, HT.

[21]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[22]  Céline Hudelot,et al.  Learning More Universal Representations for Transfer-Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[24]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[25]  Heng Ji,et al.  A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling , 2018, ACL.

[26]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[27]  Grzegorz Chrupala,et al.  Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.

[28]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[29]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[30]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[31]  Nanyun Peng,et al.  Multi-task Learning for Universal Sentence Representations: What Syntactic and Semantic Information is Captured? , 2018, ArXiv.

[32]  Thomas Wolf,et al.  A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks , 2018, AAAI.

[33]  Nasredine Semmar,et al.  A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages , 2018, Natural Language Engineering.

[34]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[35]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[36]  Nasredine Semmar,et al.  A Neural Network Model for Part-Of-Speech Tagging of Social Media Texts , 2018, LREC.

[37]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[38]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[39]  Gholamreza Haffari,et al.  Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation , 2018, ACL.

[40]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[41]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[42]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[43]  Yue Zhang,et al.  Design Challenges and Misconceptions in Neural Sequence Labeling , 2018, COLING.

[44]  Eliyahu Kiperwasser,et al.  Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[45]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[46]  Joachim Bingel,et al.  Latent Multi-Task Architecture Learning , 2017, AAAI.

[47]  Jing Gu,et al.  Data Annealing for Informal Language Understanding Tasks , 2020, FINDINGS.

[48]  Bolei Zhou,et al.  Revisiting the Importance of Individual Units in CNNs via Ablation , 2018, ArXiv.

[49]  Nasredine Semmar,et al.  Using Neural Transfer Learning for Morpho-syntactic Tagging of South-Slavic Languages Tweets , 2018, VarDial@COLING 2018.