A Decomposable Attention Model for Natural Language Inference

We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially parallelizable. On the Stanford Natural Language Inference (SNLI) dataset, we obtain state-of-the-art results with almost an order of magnitude fewer parameters than previous work and without relying on any word-order information. Adding intra-sentence attention that takes a minimum amount of order into account yields further improvements.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Johan Bos,et al.  Recognising Textual Entailment with Logical Inference , 2005, HLT.

[3]  Andrew Y. Ng,et al.  Robust Textual Inference via Graph Matching , 2005, HLT.

[4]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[5]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[6]  Shuohang Wang,et al.  Learning Natural Language Inference with LSTM , 2015, NAACL.

[7]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[8]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[9]  J. Benthem A brief history of natural logic , 2008 .

[10]  Christopher D. Manning,et al.  Learning to recognize features of valid textual entailments , 2006, NAACL.

[11]  Andrew Hickl,et al.  A Discourse Commitment-Based Framework for Recognizing Textual Entailment , 2007, ACL-PASCAL@ACL.

[12]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[13]  Christopher D. Manning,et al.  A Phrase-Based Alignment Model for Natural Language Inference , 2008, EMNLP.

[14]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[15]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[17]  Ming-Wei Chang,et al.  Discriminative Learning over Constrained Latent Representations , 2010, NAACL.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Jacob Andreas,et al.  Semantic Parsing as Machine Translation , 2013, ACL.

[22]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[23]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[24]  Oren Etzioni,et al.  Paraphrase-Driven Learning for Open Question Answering , 2013, ACL.

[25]  Christopher D. Manning,et al.  An extended model of natural logic , 2009, IWCS.

[26]  Emiel Krahmer,et al.  Classification of Semantic Relations by Humans and Machines , 2005, EMSEE@ACL.

[27]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[28]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..