Multi-Task Learning for Semantic Relatedness and Textual Entailment

Recently, several deep learning models have been successfully proposed and have been applied to solve different Natural Language Processing (NLP) tasks. However, these models solve the problem based on single-task supervised learning and do not consider the correlation between the tasks. Based on this observation, in this paper, we implemented a multi-task learning model to joint learn two related NLP tasks simultaneously and conducted experiments to evaluate if learning these tasks jointly can improve the system performance compared with learning them individually. In addition, a comparison of our model with the state-of-the-art learning models, including multi-task learning, transfer learning, unsupervised learning and feature based traditional machine learning models is presented. This paper aims to 1) show the advantage of multi-task learning over single-task learning in training related NLP tasks, 2) illustrate the influence of various encoding structures to the proposed single- and multi-task learning models, and 3) compare the performance between multi-task learning and other learning models in literature on textual entailment task and semantic relatedness task.

[1]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[3]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[4]  Alice Lai,et al.  Illinois-LH: A Denotational and Distributional Approach to Semantics , 2014, *SEMEVAL.

[5]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[6]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[7]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[8]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[9]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[10]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[11]  Qiang Yang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[13]  Jianmin Wang,et al.  Learning Multiple Tasks with Deep Relationship Networks , 2015, ArXiv.

[14]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[18]  Xiaodong Cui,et al.  English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.

[19]  Shinji Watanabe,et al.  Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[22]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[23]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[24]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[25]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[26]  Yang Shao,et al.  HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate Semantic Textual Similarity , 2017, SemEval@ACL.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.