Towards a Unified Natural Language Inference Framework to Evaluate Sentence Representations

We present a large scale unified natural language inference (NLI) dataset for providing insight into how well sentence representations capture distinct types of reasoning. We generate a large-scale NLI dataset by recasting 11 existing datasets from 7 different semantic tasks. We use our dataset of approximately half a million context-hypothesis pairs to test how well sentence encoders capture distinct semantic phenomena that are necessary for general language understanding. Some phenomena that we consider are event factuality, named entity recognition, figurative language, gendered anaphora resolution, and sentiment analysis, extending prior work that included semantic roles and frame semantic parsing. Our dataset will be available at this https URL, to grow over time as additional resources are recast.

[1]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[2]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[3]  Barbara Johnstone,et al.  The Language of Riddles: New Perspectives , 1985 .

[4]  Chantal van Son,et al.  MEANTIME, the NewsReader Multilingual Event and Time Corpus , 2016, LREC.

[5]  Oren Etzioni,et al.  An analysis of open information extraction based on semantic role labeling , 2011, K-CAP '11.

[6]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[7]  A. Zaenen,et al.  The Chameleon-like Nature of Evaluative Adjectives , 2014 .

[8]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[9]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[10]  Sheng Zhang,et al.  Universal Decompositional Semantics on Universal Dependencies , 2016, EMNLP.

[11]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[12]  Davide Buscaldi,et al.  From humor recognition to irony detection: The figurative language of social media , 2012, Data Knowl. Eng..

[13]  Matthew H. Davis,et al.  Why Clowns Taste Funny: The Relationship between Humor and Semantic Ambiguity , 2011, The Journal of Neuroscience.

[14]  Kevin Duh,et al.  Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework , 2017, IJCNLP.

[15]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[16]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[17]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[18]  Sheng Zhang,et al.  Ordinal Common-sense Inference , 2016, TACL.

[19]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[20]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[21]  Yejin Choi,et al.  Event Detection and Factuality Assessment with Non-Expert Supervision , 2015, EMNLP.

[22]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[23]  Chris Callison-Burch,et al.  Most "babies" are "little" and most "problems" are "huge": Compositional Entailment in Adjective-Nouns , 2016, ACL.

[24]  Laura Alonso Alemany,et al.  An approach using Named Entities for Recognizing Textual Entailment , 2008, TAC.

[25]  Misha Denil,et al.  From Group to Individual Labels Using Deep Features , 2015, KDD.

[26]  Harinder Pal,et al.  Demonyms and Compound Relational Nouns in Nominal Open IE , 2016, AKBC@NAACL-HLT.

[27]  Kyle Rawlins,et al.  A computational model of S-selection , 2016 .

[28]  Yonatan Belinkov,et al.  On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference , 2018, NAACL.

[29]  Maite Taboada,et al.  Sentiment Analysis: An Overview from Linguistics , 2016 .

[30]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[31]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[32]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[33]  Harinder Pal,et al.  Bootstrapping for Numerical Open IE , 2017, ACL.

[34]  Rachel Rudinger,et al.  Neural Models of Factuality , 2018, NAACL.

[35]  Janyce Wiebe,et al.  RECOGNIZING STRONG AND WEAK OPINION CLAUSES , 2006, Comput. Intell..

[36]  Ming-Wei Chang,et al.  Relation Alignment for Textual Entailment Recognition , 2009, TAC.

[37]  Van Durme,et al.  Extracting implicit knowledge from text , 2009 .

[38]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[39]  Tristan Miller,et al.  Towards the automatic detection and identification of English puns , 2016 .

[40]  Iryna Gurevych,et al.  SemEval-2017 Task 7: Detection and Interpretation of English Puns , 2017, *SEMEVAL.

[41]  Doug Arnold,et al.  Machine Translation: An Introductory Guide , 1994 .

[42]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[43]  Samuel R. Bowman,et al.  A Gold Standard Dependency Corpus for English , 2014, LREC.

[44]  Christopher D. Manning,et al.  Natural language inference , 2009 .

[45]  Yonatan Belinkov,et al.  Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.

[46]  Iryna Gurevych,et al.  Automatic disambiguation of English puns , 2015, ACL.

[47]  Noah D. Goodman,et al.  Evaluating Compositionality in Sentence Embeddings , 2018, CogSci.

[48]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[49]  Chandler May,et al.  Social Bias in Elicited Natural Language Inferences , 2017, EthNLP@EACL.

[50]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[51]  John D. Kelleher,et al.  Evaluation of a Substitution Method for Idiom Transformation in Statistical Machine Translation , 2014, MWE@EACL.

[52]  James Pustejovsky,et al.  Determining Modality and Factuality for Text Entailment , 2007 .

[53]  Sivaji Bandyopadhyay,et al.  JU_CSE_TAC: Textual Entailment Recognition System at TAC RTE-6 , 2010, TAC.

[54]  Yehoshua Bar-Hillel Some Linguistic Problems Connected with Machine Translation , 1953, Philosophy of Science.

[55]  Diyi Yang,et al.  Humor Recognition and Humor Anchor Extraction , 2015, EMNLP.

[56]  Diana Santos Lexical gaps and idioms in machine translation , 1990, COLING.

[57]  Kim Binsted,et al.  Machine humour : an implemented model of puns , 1996 .

[58]  Yonatan Belinkov,et al.  Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder , 2017, IJCNLP.

[59]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[60]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[61]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[62]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[63]  Aaron Steven White,et al.  The role of veridicality and factivity in clause selection * , 2017 .

[64]  Rico Sennrich,et al.  Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method , 2017, LREC.

[65]  Philipp Koehn,et al.  Exploring Word Sense Disambiguation Abilities of Neural Machine Translation Systems (Non-archival Extended Abstract) , 2018, AMTA.

[66]  Pierre Isabelle,et al.  A Challenge Set Approach to Evaluating Machine Translation , 2017, EMNLP.

[67]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .