论文信息 - Adversarial Learning for Multi-Task Sequence Labeling With Attention Mechanism - 字舞流文

Adversarial Learning for Multi-Task Sequence Labeling With Attention Mechanism

With the requirements of natural language applications, multi-task sequence labeling methods have some immediate benefits over the single-task sequence labeling methods. Recently, many state-of-the-art multi-task sequence labeling methods were proposed, while still many issues to be resolved including (C1) exploring a more general relationship between tasks, (C2) extracting the task-shared knowledge purely and (C3) merging the task-shared knowledge for each task appropriately. To address the above challenges, we propose MTAA, a symmetric multi-task sequence labeling model, which performs an arbitrary number of tasks simultaneously. Furthermore, MTAA extracts the shared knowledge among tasks by adversarial learning and integrates the proposed multi-representation fusion attention mechanism for merging feature representations. We evaluate MTAA on two widely used data sets: CoNLL2003 and OntoNotes5.0. Experimental results show that our proposed model outperforms the latest methods on the named entity recognition and the syntactic chunking task by a large margin, and achieves state-of-the-art results on the part-of-speech tagging task.

Hanghang Tong | Yu Wang | Yue Huang | Yun Li | Ziye Zhu | Hanghang Tong | Yue Huang | Yun Li | Yu Wang | Ziye Zhu

[1] Yuchen Zhang,et al. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[2] Ruslan Salakhutdinov,et al. Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[3] Claire Cardie,et al. Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[4] George Trigeorgis,et al. Domain Separation Networks , 2016, NIPS.

[5] Geoffrey E. Hinton,et al. Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[7] Mitchell P. Marcus,et al. OntoNotes: A Unified Relational Semantic Representation , 2007, International Conference on Semantic Computing (ICSC 2007).

[8] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[10] Masanori Hattori,et al. Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition , 2016, NLPCC/ICCPOL.

[11] Philippe Langlais,et al. Robust Lexical Features for Improved Neural Network Named-Entity Recognition , 2018, COLING.

[12] Thien Huu Nguyen,et al. Who is Killed by Police: Introducing Supervised Attention for Hierarchical LSTMs , 2018, COLING.

[13] Xuanjing Huang,et al. Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[14] Yue Zhang,et al. NCRF++: An Open-source Neural Sequence Labeling Toolkit , 2018, ACL.

[15] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[16] Chandra Bhagavatula,et al. Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[17] Zhiyuan Liu,et al. Adversarial Multi-lingual Neural Relation Extraction , 2018, COLING.

[18] Jungo Kasai,et al. Robust Multilingual Part-of-Speech Tagging via Adversarial Training , 2017, NAACL.

[19] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[20] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[21] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[22] Hexiang Hu,et al. Multi-Task Learning for Sequence Tagging: An Empirical Study , 2018, COLING.

[23] Young-Bum Kim,et al. Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources , 2017, EMNLP.

[24] Xiaocheng Feng,et al. Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer , 2018, IJCAI.

[25] Quoc V. Le,et al. Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[26] Yue Zhang,et al. Chinese NER Using Lattice LSTM , 2018, ACL.

[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[28] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[29] Xuanjing Huang,et al. Adversarial Multi-Criteria Learning for Chinese Word Segmentation , 2017, ACL.

[30] Zheng Liu,et al. SC-NER: A Sequence-to-Sequence Model with Sentence Classification for Named Entity Recognition , 2019, PAKDD.

[31] Alan Edelman,et al. The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[32] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[33] Heng Ji,et al. A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling , 2018, ACL.

[34] Wei Zhang,et al. Adversarial Learning for Chinese NER from Crowd Annotations , 2018, AAAI.

[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[37] Andrew M. Dai,et al. Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[38] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[39] Ruslan Salakhutdinov,et al. Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[40] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[41] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[42] Yue Zhang,et al. Combining Discrete and Neural Features for Sequence Labeling , 2016, CICLing.

[43] Anders Søgaard,et al. Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[44] Xiang Ren,et al. Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[45] Jun Zhao,et al. Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism , 2018, EMNLP.

[46] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[47] Rabab Kreidieh Ward,et al. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[48] Walter Daelemans,et al. MBT : Memory Based Tagger, version 1.0, Reference Guide , 2002 .

[49] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[50] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[51] Philippe Langlais,et al. SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling , 2019, NAACL.

[52] Nanyun Peng,et al. Multi-task Domain Adaptation for Sequence Tagging , 2016, Rep4NLP@ACL.

[53] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[54] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[55] Zhiyuan Liu,et al. Neural Relation Extraction with Multi-lingual Attention , 2017, ACL.

[56] Yoshimasa Tsuruoka,et al. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[57] Satoshi Sekine,et al. A survey of named entity recognition and classification , 2007 .

[58] Kenny Q. Zhu,et al. Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media , 2017, NUT@EMNLP.