Testing the Reasoning Power for NLI Models with Annotated Multi-perspective Entailment Dataset

Natural language inference (NLI) is a challenging task to determine the relationship between a pair of sentences. Existing Neural Network-based (NN-based) models have achieved prominent success. However, rare models are interpretable. In this paper, we propose a Multi-perspective Entailment Category Labeling System (METALs). It consists of three categories, ten sub-categories. We manually annotate 3,368 entailment items. The annotated data is used to explain the recognition ability of four NN-based models at a fine-grained level. The experimental results show that all the models have poor performance in the commonsense reasoning than in other entailment categories. The highest accuracy difference is 13.22%.

[1]  Ido Dagan,et al.  Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference , 2010, LREC.

[2]  Percy Liang,et al.  Transforming Question Answering Datasets Into Natural Language Inference Datasets , 2018, ArXiv.

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Christiane Fellbaum,et al.  On the Role of Lexical and World Knowledge in RTE3 , 2007, ACL-PASCAL@ACL.

[5]  Siu Cheung Hui,et al.  Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference , 2017, EMNLP.

[6]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[7]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[8]  Konstantina Garoufi Towards a Better Understanding of Applied Textual Entailment: Annotation and Evaluation of the RTE-2 Dataset , 2007 .

[9]  Lucy Vanderwende,et al.  What Syntax Can Contribute in the Entailment Task , 2005, MLCW.

[10]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Yoav Goldberg,et al.  Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[13]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[14]  Carolyn Penstein Rosé,et al.  Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[15]  Sebastian Riedel,et al.  Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness , 2018, NAACL.

[16]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[17]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[18]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[19]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[20]  Kirk Roberts Building an Annotated Textual Inference Corpus for Motion and Space , 2009, TextInfer@ACL.

[21]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[22]  Jason Weston,et al.  Dialogue Natural Language Inference , 2018, ACL.

[23]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[26]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[27]  Xiaoli Z. Fern,et al.  DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference , 2018, NAACL.

[28]  Mark Dras,et al.  Using Hypernymy Acquisition to Tackle (Part of) Textual Entailment , 2009, TextInfer@ACL.

[29]  Dan Roth,et al.  “Ask Not What Textual Entailment Can Do for You...” , 2010, ACL.

[30]  Zhen-Hua Ling,et al.  Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference , 2017, RepEval@EMNLP.