SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling

Multi-task learning (MTL) has been studied recently for sequence labeling. Typically, auxiliary tasks are selected specifically in order to improve the performance of a target task. Jointly learning multiple tasks in a way that benefit all of them simultaneously can increase the utility of MTL. In order to do so, we propose a new LSTM cell which contains both shared parameters that can learn from all tasks, and task-specific parameters that can learn task-specific information. We name it a Shared-Cell Long-Short Term Memory SC-LSTM. Experimental results on three sequence labeling benchmarks (named-entity recognition, text chunking, and part-of-speech tagging) demonstrate the effectiveness of our SC-LSTM cell.

[1]  Ari Rappoport,et al.  Multitask Parsing Across Semantic Representations , 2018, ACL.

[2]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[3]  Young-Bum Kim,et al.  Frustratingly Easy Neural Domain Adaptation , 2016, COLING.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[9]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[12]  Xiang Ren,et al.  Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling , 2018, EMNLP.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Johan Bos,et al.  Semantic Tagging with Deep Residual Networks , 2016, COLING.

[15]  Hexiang Hu,et al.  Multi-Task Learning for Sequence Tagging: An Empirical Study , 2018, COLING.

[16]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[17]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[18]  Anima Anandkumar,et al.  Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[19]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[20]  Eliyahu Kiperwasser,et al.  Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[21]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[22]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[23]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[24]  Noah A. Smith,et al.  Deep Multitask Learning for Semantic Dependency Parsing , 2017, ACL.

[25]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[28]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[31]  Iryna Gurevych,et al.  Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks , 2017, ArXiv.

[32]  Yue Zhang,et al.  Design Challenges and Misconceptions in Neural Sequence Labeling , 2018, COLING.

[33]  Jörg Tiedemann,et al.  Efficient Word Alignment with Markov Chain Monte Carlo , 2016, Prague Bull. Math. Linguistics.

[34]  Barbara Plank,et al.  When is multitask learning effective? Semantic sequence prediction under varying data conditions , 2016, EACL.

[35]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[36]  Yuan Zhang,et al.  Learning Tag Dependencies for Sequence Tagging , 2018, IJCAI.

[37]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[38]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[39]  Zaiqing Nie,et al.  Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[40]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[41]  Yuan Zhang,et al.  Stack-propagation: Improved Representation Learning for Syntax , 2016, ACL.

[42]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[43]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[44]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[45]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[46]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[47]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[48]  Jan Niehues,et al.  Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[49]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.