Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference

Natural Language Inference (NLI), also known as Recognizing Textual Entailment (RTE), is one of the most important problems in natural language processing. It requires to infer the logical relationship between two given sentences. While current approaches mostly focus on the interaction architectures of the sentences, in this paper, we propose to transfer knowledge from some important discourse markers to augment the quality of the NLI model. We observe that people usually use some discourse markers such as “so” or “but” to represent the logical relationship between two sentences. These words potentially have deep connections with the meanings of the sentences, thus can be utilized to help improve the representations of them. Moreover, we use reinforcement learning to optimize a new objective function with a reward defined by the property of the NLI datasets to make full use of the labels information. Experiments show that our method achieves the state-of-the-art performance on several large-scale datasets.

[1]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[2]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[3]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[4]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Ye Yuan,et al.  Words or Characters? Fine-grained Gating for Reading Comprehension , 2016, ICLR.

[7]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[8]  Samuel R. Bowman,et al.  Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning , 2017, ArXiv.

[9]  Christopher D. Manning,et al.  Natural Logic for Textual Inference , 2007, ACL-PASCAL@ACL.

[10]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[11]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[12]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[13]  Jian Zhang,et al.  Natural Language Inference over Interaction Space , 2017, ICLR.

[14]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[15]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[16]  Siu Cheung Hui,et al.  Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference , 2017, EMNLP.

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Jihun Choi,et al.  Learning to Compose Task-Specific Tree Structures , 2017, AAAI.

[19]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[20]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Ido Dagan,et al.  Web Based Probabilistic Textual Entailment , 2005 .

[22]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[23]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[24]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[25]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[26]  Zhen-Hua Ling,et al.  Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference , 2017, RepEval@EMNLP.

[27]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[28]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[29]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[30]  Deng Cai,et al.  Keyword-based Query Comprehending via Multiple Optimized-Demand Augmentation , 2017, ArXiv.

[31]  Huimin Zhang,et al.  User Personalized Satisfaction Prediction via Multiple Instance Deep Learning , 2016, WWW.

[32]  Deng Cai,et al.  MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension , 2017, ArXiv.

[33]  Siu Cheung Hui,et al.  A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference , 2017, ArXiv.

[34]  Deng Cai,et al.  What to Do Next: Modeling User Behaviors by Time-LSTM , 2017, IJCAI.

[35]  Jerry R. Hobbs Literature And Cognition , 1990 .

[36]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[37]  Zhifang Sui,et al.  Reading and Thinking: Re-read LSTM Unit for Textual Entailment Recognition , 2016, COLING.

[38]  Noah D. Goodman,et al.  DisSent: Sentence Representation Learning from Explicit Discourse Relations , 2017, ArXiv.

[39]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[40]  Mohit Bansal,et al.  Shortcut-Stacked Sentence Encoders for Multi-Domain Inference , 2017, RepEval@EMNLP.

[41]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.