Adversarial Learning for Multi-Task Sequence Labeling With Attention Mechanism

With the requirements of natural language applications, multi-task sequence labeling methods have some immediate benefits over the single-task sequence labeling methods. Recently, many state-of-the-art multi-task sequence labeling methods were proposed, while still many issues to be resolved including (C1) exploring a more general relationship between tasks, (C2) extracting the task-shared knowledge purely and (C3) merging the task-shared knowledge for each task appropriately. To address the above challenges, we propose MTAA, a symmetric multi-task sequence labeling model, which performs an arbitrary number of tasks simultaneously. Furthermore, MTAA extracts the shared knowledge among tasks by adversarial learning and integrates the proposed multi-representation fusion attention mechanism for merging feature representations. We evaluate MTAA on two widely used data sets: CoNLL2003 and OntoNotes5.0. Experimental results show that our proposed model outperforms the latest methods on the named entity recognition and the syntactic chunking task by a large margin, and achieves state-of-the-art results on the part-of-speech tagging task.

[1]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[2]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[3]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[4]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[5]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[7]  Mitchell P. Marcus,et al.  OntoNotes: A Unified Relational Semantic Representation , 2007, International Conference on Semantic Computing (ICSC 2007).

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[10]  Masanori Hattori,et al.  Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition , 2016, NLPCC/ICCPOL.

[11]  Philippe Langlais,et al.  Robust Lexical Features for Improved Neural Network Named-Entity Recognition , 2018, COLING.

[12]  Thien Huu Nguyen,et al.  Who is Killed by Police: Introducing Supervised Attention for Hierarchical LSTMs , 2018, COLING.

[13]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[14]  Yue Zhang,et al.  NCRF++: An Open-source Neural Sequence Labeling Toolkit , 2018, ACL.

[15]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[16]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[17]  Zhiyuan Liu,et al.  Adversarial Multi-lingual Neural Relation Extraction , 2018, COLING.

[18]  Jungo Kasai,et al.  Robust Multilingual Part-of-Speech Tagging via Adversarial Training , 2017, NAACL.

[19]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[20]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Hexiang Hu,et al.  Multi-Task Learning for Sequence Tagging: An Empirical Study , 2018, COLING.

[23]  Young-Bum Kim,et al.  Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources , 2017, EMNLP.

[24]  Xiaocheng Feng,et al.  Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer , 2018, IJCAI.

[25]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[26]  Yue Zhang,et al.  Chinese NER Using Lattice LSTM , 2018, ACL.

[27]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[28]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[29]  Xuanjing Huang,et al.  Adversarial Multi-Criteria Learning for Chinese Word Segmentation , 2017, ACL.

[30]  Zheng Liu,et al.  SC-NER: A Sequence-to-Sequence Model with Sentence Classification for Named Entity Recognition , 2019, PAKDD.

[31]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[32]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[33]  Heng Ji,et al.  A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling , 2018, ACL.

[34]  Wei Zhang,et al.  Adversarial Learning for Chinese NER from Crowd Annotations , 2018, AAAI.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[37]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[38]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[39]  Ruslan Salakhutdinov,et al.  Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[40]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[41]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[42]  Yue Zhang,et al.  Combining Discrete and Neural Features for Sequence Labeling , 2016, CICLing.

[43]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[44]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[45]  Jun Zhao,et al.  Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism , 2018, EMNLP.

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[48]  Walter Daelemans,et al.  MBT : Memory Based Tagger, version 1.0, Reference Guide , 2002 .

[49]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[50]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[51]  Philippe Langlais,et al.  SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling , 2019, NAACL.

[52]  Nanyun Peng,et al.  Multi-task Domain Adaptation for Sequence Tagging , 2016, Rep4NLP@ACL.

[53]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[54]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[55]  Zhiyuan Liu,et al.  Neural Relation Extraction with Multi-lingual Attention , 2017, ACL.

[56]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[57]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[58]  Kenny Q. Zhu,et al.  Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media , 2017, NUT@EMNLP.