Multitask learning approach for understanding the relationship between two sentences

Abstract Understanding the relationship between two sentences through tasks including natural language inference, paraphrase detection, semantic textual similarity, and semantic relatedness is a fundamental step to natural language understanding. We propose an approach to infer the relationship between two sentences using a multitask framework to generate a universal representation of the relationship. Our model consists of a universal layer shared for all tasks with several task-specific layers on top for each task. To generate universal representation, we employ the enhanced sequential inference model based on a deep learning and soft alignment techniques. The task-specific layers are composed of multilayer perceptrons. The main feature of the proposed approach is that a single encoder can model various relationship of sentences at same time on multiple tasks. When we evaluated our approach on four public datasets for four different tasks regarding the relationship between two sentences, it outperformed state-of-the-art methods for two datasets and performed significantly well for the other two datasets. Further investigation of our proposed model showed that it captures comprehensive information together with specific knowledge regarding each task to infer semantic similarity. The detailed analysis supports that the proposed approach is robust over all semantic inference tasks using a single model.

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[3]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[4]  Jacob Eisenstein,et al.  Discriminative Improvements to Distributional Sentence Similarity , 2013, EMNLP.

[5]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[6]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[7]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[8]  Jihun Choi,et al.  Learning to Compose Task-Specific Tree Structures , 2017, AAAI.

[9]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[10]  Zhen-Hua Ling,et al.  Enhancing Sentence Embedding with Generalized Pooling , 2018, COLING.

[11]  Xiaohui Yan,et al.  Abstract Meaning Representation for Paraphrase Detection , 2018, NAACL.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Hongseok Choi,et al.  GIST at SemEval-2018 Task 12: A network transferring inference knowledge to Argument Reasoning Comprehension task , 2018, SemEval@NAACL-HLT.

[14]  J. Benthem A brief history of natural logic , 2008 .

[15]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[16]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Yang Shao,et al.  HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate Semantic Textual Similarity , 2017, SemEval@ACL.

[19]  Alice Lai,et al.  Illinois-LH: A Denotational and Distributional Approach to Semantics , 2014, *SEMEVAL.

[20]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[21]  Man Lan,et al.  ECNU at SemEval-2017 Task 1: Leverage Kernel-based Traditional NLP features and Neural Networks to Build a Universal Model for Multilingual and Cross-lingual Semantic Textual Similarity , 2017, SemEval@ACL.

[22]  Malvina Nissim,et al.  The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity , 2014, *SEMEVAL.

[23]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[24]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches – Erratum , 2010, Natural Language Engineering.

[27]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[28]  Benno Stein,et al.  SemEval-2018 Task 12: The Argument Reasoning Comprehension Task , 2018, *SEMEVAL.

[29]  Stephen Wan,et al.  Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase , 2006, ALTA.

[30]  Alexander F. Gelbukh,et al.  UNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment , 2014, *SEMEVAL.

[31]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment , 2014, *SEMEVAL.

[32]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[33]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[34]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[35]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[36]  Eduard H. Hovy,et al.  Squibs: What Is a Paraphrase? , 2013, CL.

[37]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[38]  Mohit Bansal,et al.  Shortcut-Stacked Sentence Encoders for Multi-Domain Inference , 2017, RepEval@EMNLP.

[39]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[40]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[41]  Man Lan,et al.  ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[42]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.