Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches

In the NLP community, recent years have seen a surge of research activities that address machines' ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge of the world. Many benchmark tasks and datasets have been created to support the development and evaluation of such natural language inference ability. As these benchmarks become instrumental and a driving force for the NLP research community, this paper aims to provide an overview of recent benchmarks, relevant knowledge resources, and state-of-the-art learning and inference approaches in order to support a better understanding of this growing field.

[1]  Marie-Francine Moens,et al.  Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates , 2017, AAAI.

[2]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jerry R. Hobbs,et al.  Interpretation as Abduction , 1993, Artif. Intell..

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Oren Etzioni,et al.  Green AI , 2019, Commun. ACM.

[6]  Christopher D. Manning,et al.  Natural Logic for Textual Inference , 2007, ACL-PASCAL@ACL.

[7]  Christopher D. Manning,et al.  GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Michael S. Bernstein,et al.  Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Event Chains , 2008, ACL.

[10]  Heiko Paulheim,et al.  How much is a Triple? Estimating the Cost of Knowledge Graph Creation , 2018, SEMWEB.

[11]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[12]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[13]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[15]  Yejin Choi,et al.  Verb Physics: Relative Physical Knowledge of Actions and Objects , 2017, ACL.

[16]  Hai Wang,et al.  Broad Context Language Modeling as Reading Comprehension , 2016, EACL.

[17]  Zhen-Hua Ling,et al.  Neural Natural Language Inference Models Enhanced with External Knowledge , 2017, ACL.

[18]  Susanne Westphal,et al.  The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  James Allen,et al.  Tackling the Story Ending Biases in The Story Cloze Test , 2018, ACL.

[20]  Daniel J. Olsher Semantically-based priors and nuanced knowledge core for Big Data, Social AI, and language understanding , 2014, Neural Networks.

[21]  A. D. Morgan Formal Logic: Or, The Calculus of Inference, Necessary and Probable , 2002 .

[22]  Junji Tomita,et al.  Commonsense Knowledge Base Completion and Generation , 2018, CoNLL.

[23]  Shaohua Yang,et al.  Physical Causality of Action Verbs in Grounded Language Understanding , 2016, ACL.

[24]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[25]  Geoffrey Nunberg Position paper on common-sense and formal semantics , 1987, TINLAP '87.

[26]  Claire Cardie,et al.  DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension , 2019, TACL.

[27]  Herbert A. Davidson,et al.  Alfarabi, Avicenna, and Averroes, on Intellect: Their Cosmologies, Theories of the Active Intellect, and Theories of Human Intellect , 1992 .

[28]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[29]  Matteo Negri,et al.  Semeval-2013 Task 8: Cross-lingual Textual Entailment for Content Synchronization , 2013, *SEMEVAL.

[30]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Lotfi A. Zadeh,et al.  The concept of a linguistic variable and its application to approximate reasoning-III , 1975, Inf. Sci..

[32]  Olena Medelyan,et al.  Integrating Cyc and Wikipedia: Folksonomy meets rigorously defined common-sense , 2008, AAAI 2008.

[33]  Guokun Lai,et al.  Large-scale Cloze Test Dataset Created by Teachers , 2017, EMNLP.

[34]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[35]  Xin Liu,et al.  ASER: A Large-scale Eventuality Knowledge Graph , 2019, WWW.

[36]  Omer Levy,et al.  Simulating Action Dynamics with Neural Process Networks , 2017, ICLR.

[37]  Larry S. Davis,et al.  Explicit Bias Discovery in Visual Question Answering Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Charles S. Peirce,et al.  A theory of probable inference. , 1883 .

[39]  Rajarshi Das,et al.  A Survey on Semantic Parsing , 2018, AKBC.

[40]  Yoav Goldberg,et al.  Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[41]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[42]  Yejin Choi,et al.  The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task , 2017, CoNLL.

[43]  Chen Zhang,et al.  Towards Conversation Entailment: An Empirical Investigation , 2010, EMNLP.

[44]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[45]  Aditya Gupta,et al.  Tracking Discrete and Continuous Entity State for Process Understanding , 2019, SPNLP@NAACL-HLT.

[46]  Gerhard Weikum,et al.  WebChild 2.0 : Fine-Grained Commonsense Knowledge Distillation , 2017, ACL.

[47]  Charles L. Ortiz Why We Need a Physically Embodied Turing Test and What It Might Look Like , 2016, AI Mag..

[48]  Hung-Yu Kao,et al.  Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[49]  Haixun Wang,et al.  Semantic Multidimensional Scaling for Open-Domain Sentiment Analysis , 2014, IEEE Intelligent Systems.

[50]  Yue Zhang,et al.  Integrating Order Information and Event Relation for Script Event Prediction , 2017, EMNLP.

[51]  Mohit Bansal,et al.  Commonsense for Generative Multi-Hop Question Answering Tasks , 2018, EMNLP.

[52]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[53]  Ali Farhadi,et al.  Query-Reduction Networks for Question Answering , 2016, ICLR.

[54]  Ernest Davis,et al.  Logical Formalizations of Commonsense Reasoning: A Survey , 2017, J. Artif. Intell. Res..

[55]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[56]  Yu Hu,et al.  Combing Context and Commonsense Knowledge Through Neural Networks for Solving Winograd Schema Problems , 2017, AAAI Spring Symposia.

[57]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[58]  Gerhard Weikum,et al.  WebChild: harvesting and organizing commonsense knowledge from the web , 2014, WSDM.

[59]  G. Miller,et al.  Plans and the structure of behavior , 1960 .

[60]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[62]  Patrick Pantel,et al.  VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations , 2004, EMNLP.

[63]  Renée Baillargeon,et al.  Can infants attribute to an agent a disposition to perform a particular action? , 2005, Cognition.

[64]  Bhavana Dalvi,et al.  Reasoning about Actions and State Changes by Injecting Commonsense Knowledge , 2018, EMNLP.

[65]  Dan Roth,et al.  Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[66]  Kevin Duh,et al.  Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework , 2017, IJCNLP.

[67]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[68]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[69]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[70]  William A. Woods,et al.  What's in a Link: Foundations for Semantic Networks , 1975 .

[71]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[72]  Erik Cambria,et al.  SenticNet: A Publicly Available Semantic Resource for Opinion Mining , 2010, AAAI Fall Symposium: Commonsense Knowledge.

[73]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[74]  Lenhart K. Schubert,et al.  Using Textual Patterns to Learn Expected Event Frequencies , 2012, AKBC-WEKEX@NAACL-HLT.

[75]  Yasemin Altun,et al.  Reading Comprehension Programs in a Statistical-Language-Processing Class , 2000 .

[76]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[77]  Yejin Choi,et al.  Event2Mind: Commonsense Inference on Events, Intents, and Reactions , 2018, ACL.

[78]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[79]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[80]  Yonatan Belinkov,et al.  Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference , 2019, ACL.

[81]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[82]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[83]  Chris Brew,et al.  SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge , 2013, *SEMEVAL.

[84]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[85]  Leora Morgenstern,et al.  Planning, Executing, and Evaluating the Winograd Schema Challenge , 2016, AI Mag..

[86]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[87]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[88]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[89]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[90]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[91]  Yejin Choi,et al.  Modeling Naive Psychology of Characters in Simple Commonsense Stories , 2018, ACL.

[92]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[93]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[94]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[95]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[96]  Eduard Hovy,et al.  OntoNotes: A Unified Relational Semantic Representation , 2007 .

[97]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[98]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[99]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[100]  Yejin Choi,et al.  Globally Coherent Text Generation with Neural Checklist Models , 2016, EMNLP.

[101]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[102]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[103]  Ali Farhadi,et al.  From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[104]  Boyang Li,et al.  Story Generation with Crowdsourced Plot Graphs , 2013, AAAI.

[105]  Yejin Choi,et al.  Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.

[106]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[107]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[108]  Shaohua Yang,et al.  What Action Causes This? Towards Naive Physical Action-Effect Prediction , 2018, ACL.

[109]  Shaohua Yang,et al.  Commonsense Justification for Action Explanation , 2018, EMNLP.

[110]  Chantal van Son,et al.  MEANTIME, the NewsReader Multilingual Event and Time Corpus , 2016, LREC.

[111]  D. Hilbert,et al.  Principles of Mathematical Logic , 1950 .

[112]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[113]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[114]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[115]  David Poole,et al.  A methodology for using a default and abductive reasoning system , 1989, Int. J. Intell. Syst..

[116]  Benno Stein,et al.  The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants , 2017, NAACL.

[117]  Oren Etzioni,et al.  Learning First-Order Horn Clauses from Web Text , 2010, EMNLP.

[118]  Doug Downey,et al.  Abductive Commonsense Reasoning , 2019, ICLR.

[119]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[120]  Dan Roth,et al.  “Going on a vacation” takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding , 2019, EMNLP.

[121]  Hiroaki Sato,et al.  The FrameNet Database and Software Tools , 2002, LREC.

[122]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[123]  José Camacho-Collados,et al.  WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[124]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[125]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[126]  Andrew S. Gordon,et al.  Commonsense Interpretation of Triangle Behavior , 2016, AAAI.

[127]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[128]  Rachel Rudinger,et al.  Neural Models of Factuality , 2018, NAACL.

[129]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[130]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[131]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[132]  G. Boole An Investigation of the Laws of Thought: On which are founded the mathematical theories of logic and probabilities , 2007 .

[133]  Henry Lieberman,et al.  AnalogySpace: Reducing the Dimensionality of Common Sense Knowledge , 2008, AAAI.

[134]  Stevan Harnad,et al.  Minds, Machines and Turing , 2000, J. Log. Lang. Inf..

[135]  Gerhard Weikum,et al.  Acquiring Comparative Commonsense Knowledge from the Web , 2014, AAAI.

[136]  Akiko Aizawa,et al.  Prerequisite Skills for Reading Comprehension: Multi-Perspective Analysis of MCTest Datasets and Systems , 2017, AAAI.

[137]  Charles J. Fillmore,et al.  SCENES- AND- FRAMES SEMANTICS. , 1977 .

[138]  Hwee Tou Ng,et al.  A Machine Learning Approach to Answering Questions for Reading Comprehension Tests , 2000, EMNLP.

[139]  Nathanael Chambers,et al.  LSDSem 2017 Shared Task: The Story Cloze Test , 2017, LSDSem@EACL.

[140]  Christopher Kanan,et al.  Visual question answering: Datasets, algorithms, and future challenges , 2016, Comput. Vis. Image Underst..

[141]  Frank Keller,et al.  Comparing Automatic Evaluation Measures for Image Description , 2014, ACL.

[142]  Christopher D. Manning,et al.  Compositional Attention Networks for Machine Reasoning , 2018, ICLR.

[143]  Jackie Chi Kit Cheung,et al.  A Knowledge Hunting Framework for Common Sense Reasoning , 2018, EMNLP.

[144]  Beatrice Santorini,et al.  The Penn Treebank: An Overview , 2003 .

[145]  Christopher Joseph Pal,et al.  Movie Description , 2016, International Journal of Computer Vision.

[146]  Alexander Gavrilenko,et al.  Machine Common Sense , 2020, ArXiv.

[147]  Judith Tonhauser,et al.  The CommitmentBank: Investigating projection in naturally occurring discourse , 2019 .

[148]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[149]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[150]  Xiaodong Liu,et al.  ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension , 2018, ArXiv.

[151]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[152]  Terry Winograd,et al.  Understanding natural language , 1974 .

[153]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[154]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[155]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[156]  Push Singh,et al.  The Public Acquisition of Commonsense Knowledge , 2002 .

[157]  Wendy Lehnert,et al.  Human and Computational Question Answering , 1977, Cogn. Sci..

[158]  Yotaro Watanabe,et al.  Leveraging knowledge bases for future prediction with memory comparison networks , 2018, AI Commun..

[159]  Vincent Ng,et al.  Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge , 2012, EMNLP.

[160]  Thomas L. Griffiths,et al.  Evaluating Theory of Mind in Question Answering , 2018, EMNLP.

[161]  Isabelle Augenstein,et al.  Numerically Grounded Language Models for Semantic Error Correction , 2016, EMNLP.

[162]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[163]  F. Heider,et al.  An experimental study of apparent behavior , 1944 .

[164]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[165]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[166]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[167]  Heng Ji,et al.  Graph-based Event Coreference Resolution , 2009, Graph-based Methods for Natural Language Processing.

[168]  Peter Clark,et al.  Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[169]  Iryna Gurevych,et al.  Frame- and Entity-Based Knowledge for Common-Sense Argumentative Reasoning , 2018, ArgMining@EMNLP.

[170]  Kai Ishikawa,et al.  IKOMA at TAC2011: A Method for Recognizing Textual Entailment using Lexical-level and Sentence Structure-level features , 2011, TAC.

[171]  Christopher D. Manning,et al.  NaturalLI: Natural Logic Inference for Common Sense Reasoning , 2014, EMNLP.

[172]  Haixun Wang,et al.  Isanette: A Common and Common Sense Knowledge Base for Opinion Mining , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[173]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[174]  Ernest Davis,et al.  Commonsense reasoning and commonsense knowledge in artificial intelligence , 2015, Commun. ACM.

[175]  Leora Morgenstern,et al.  The First Winograd Schema Challenge at IJCAI-16 , 2017, AI Mag..

[176]  Yejin Choi,et al.  WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale , 2020, AAAI.

[177]  Jane Yung-jen Hsu,et al.  Bridging Common Sense Knowledge Bases with Analogy by Graph Similarity , 2010, Collaboratively-Built Knowledge Sources and AI.

[178]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[179]  Adrian Iftene,et al.  UAIC Participation at RTE4 , 2008, TAC.

[180]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[181]  Bhavana Dalvi,et al.  Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension , 2018, NAACL.

[182]  Lawrence Birnbaum,et al.  Rigor Mortis: A Response to Nilsson's "Logic and Artificial Intelligence" , 1991, Artif. Intell..

[183]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[184]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[185]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[186]  Mark Yatskar,et al.  A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC , 2018, NAACL.

[187]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[188]  Smaranda Muresan,et al.  ‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions , 2018, ACL.

[189]  R. Baillargeon Infants' Physical World , 2004 .

[190]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[191]  Jackie Chi Kit Cheung,et al.  Commonsense mining as knowledge base completion? A study on the impact of novelty , 2018, ArXiv.

[192]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[193]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[194]  Aleksander Pohl Classifying the Wikipedia Articles into the OpenCyc Taxonomy , 2012, WoLE@ISWC.

[195]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[196]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track , 2001, LREC.

[197]  Lenhart K. Schubert,et al.  Discovering Commonsense Entailment Rules Implicit in Sentences , 2011, TextInfer@EMNLP.

[198]  Diyi Yang,et al.  Humor Recognition and Humor Anchor Extraction , 2015, EMNLP.

[199]  A. Maslow A Theory of Human Motivation , 1943 .

[200]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[201]  Jiasen Lu,et al.  VQA: Visual Question Answering , 2015, ICCV.

[202]  Rajarshi Das,et al.  Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension , 2018, ICLR.

[203]  Xiaoyan Zhu,et al.  Commonsense Knowledge Aware Conversation Generation with Graph Attention , 2018, IJCAI.

[204]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[205]  Lynette Hirschman,et al.  Deep Read: A Reading Comprehension System , 1999, ACL.

[206]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[207]  Loizos Michael,et al.  A Hybrid Approach to Commonsense Knowledge Acquisition , 2016, STAIRS.

[208]  Simon Ostermann,et al.  MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge , 2018, LREC.

[209]  Doug Downey,et al.  Extracting Commonsense Properties from Embeddings with Limited Human Guidance , 2018, ACL.

[210]  Beth M. Sundheim TIPSTER/MUC-5 Information Extraction System Evaluation , 1993, TIPSTER.

[211]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[212]  Dan Roth,et al.  Semantic Role Labeling Via Integer Linear Programming Inference , 2004, COLING.

[213]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[214]  Yoav Artzi,et al.  A Corpus of Natural Language for Visual Reasoning , 2017, ACL.

[215]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[216]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[217]  Hector J. Levesque,et al.  Abductive and Default Reasoning: A Computational Core , 1990, AAAI.

[218]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[219]  S. Reiss Multifaceted Nature of Intrinsic Motivation: The Theory of 16 Basic Desires , 2004 .

[220]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[221]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[222]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[223]  Sheng Zhang,et al.  Ordinal Common-sense Inference , 2016, TACL.

[224]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[225]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[226]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[227]  G. Lakoff Linguistics and natural logic , 1970, Synthese.

[228]  Rashmi Prasad,et al.  The Penn Discourse Treebank , 2004, LREC.

[229]  Cungen Cao,et al.  A Survey of Commonsense Knowledge Acquisition , 2013, Journal of Computer Science and Technology.

[230]  Kenny Q. Zhu,et al.  Automatic Extraction of Commonsense LocatedNear Knowledge , 2017, ACL.

[231]  Sebastian Riedel,et al.  Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers , 2018, ACL.

[232]  Erik Cambria,et al.  SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis , 2014, AAAI.

[233]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[234]  Yejin Choi,et al.  Event Detection and Factuality Assessment with Non-Expert Supervision , 2015, EMNLP.

[235]  Hinrich Schütze,et al.  SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference , 2019, ACL.

[236]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[237]  Adam Trischler,et al.  How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG , 2018, EMNLP.

[238]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[239]  Misha Denil,et al.  From Group to Individual Labels Using Deep Features , 2015, KDD.

[240]  Chen Zhang,et al.  What do We Know about Conversation Participants: Experiments on Conversation Entailment , 2009, SIGDIAL Conference.

[241]  Zornitsa Kozareva,et al.  Learning Temporal Information for States and Events , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[242]  Xiang Li,et al.  Commonsense Knowledge Base Completion , 2016, ACL.

[243]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[244]  Marvin Minsky,et al.  Commonsense-based interfaces , 2000, CACM.

[245]  Robert C. Moore The Role of Logic in Knowledge Representation and Commonsense Reasoning , 1982, AAAI.

[246]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[247]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[248]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[249]  Haixun Wang,et al.  Towards a Probabilistic Taxonomy of Many Concepts , 2011 .

[250]  Jörg Franke,et al.  Robust and Scalable Differentiable Neural Computer for Question Answering , 2018, QA@ACL.

[251]  Rina Dechter Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms , 2013, Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms.

[252]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[253]  Timothy Chklovski,et al.  Learner: a system for acquiring commonsense knowledge by analogy , 2003, K-CAP '03.

[254]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[255]  Rajat Raina,et al.  Robust Textual Inference Via Learning and Abductive Reasoning , 2005, AAAI.

[256]  Rachel Rudinger,et al.  Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation , 2018, BlackboxNLP@EMNLP.

[257]  John McCarthy,et al.  Programs with common sense , 1960 .

[258]  Lenhart K. Schubert,et al.  Evaluation of Commonsense Knowledge with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[259]  Wentao Ma,et al.  HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension , 2018, ArXiv.

[260]  Yejin Choi,et al.  Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[261]  Jitendra Malik,et al.  From Lifestyle Vlogs to Everyday Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[262]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[263]  Sanda M. Harabagiu,et al.  A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference , 2008, LREC.

[264]  Ido Dagan,et al.  Global Learning of Typed Entailment Rules , 2011, ACL.

[265]  Shaohua Yang,et al.  Language to Action: Towards Interactive Task Learning with Physical Agents , 2018, IJCAI.

[266]  Iryna Gurevych,et al.  SemEval-2017 Task 7: Detection and Interpretation of English Puns , 2017, *SEMEVAL.

[267]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[268]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[269]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[270]  Alice Lai,et al.  Illinois-LH: A Denotational and Distributional Approach to Semantics , 2014, *SEMEVAL.

[271]  Andrew S. Gordon,et al.  An Encoder-decoder Approach to Predicting Causal Relations in Stories , 2018 .

[272]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[273]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[274]  Percy Liang,et al.  How Much is 131 Million Dollars? Putting Numbers in Perspective with Compositional Descriptions , 2016, ACL.

[275]  Chris Dyer,et al.  The NarrativeQA Reading Comprehension Challenge , 2017, TACL.

[276]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[277]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[278]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.