论文信息 - A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.

[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[2] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[3] Andrew Y. Ng,et al. Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[4] Noah A. Smith,et al. Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[5] Yan Pan,et al. Modelling Sentence Pairs with Tree-structured Attentive Encoder , 2016, COLING.

[6] Bowen Zhou,et al. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[7] Jun Zhao,et al. Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[8] Noah A. Smith,et al. What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[9] Martial Hebert,et al. Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Eliyahu Kiperwasser,et al. Easy-First Dependency Parsing with Hierarchical Tree LSTMs , 2016, TACL.

[11] Christopher Kermorvant,et al. Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[12] Yuan Zhang,et al. Stack-propagation: Improved Representation Learning for Syntax , 2016, ACL.

[13] Slav Petrov,et al. Improved Transition-Based Parsing and Tagging with Neural Networks , 2015, EMNLP.

[14] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15] Yusuke Miyao,et al. Learning with Lookahead: Can History-Based Models Rival Globally Optimized Models? , 2011, CoNLL.

[16] Kevin Gimpel,et al. Charagram: Embedding Words and Sentences via Character n-grams , 2016, EMNLP.

[17] Jason Eisner. Efficient Normal-Form Parsing for Combinatory Categorial Grammar , 1996, ACL.

[18] Sebastian Riedel,et al. Deep Semi-Supervised Learning with Linguistically Motivated Sequence Labeling Task Hierarchies , 2016, ArXiv.

[19] Yuji Matsumoto,et al. Chunking with Support Vector Machines , 2001, NAACL.

[20] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[21] Eugene Charniak,et al. Parsing as Language Modeling , 2016, EMNLP.