Abductive Commonsense Reasoning

Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks -- (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform--despite their strong performance on the related but more narrowly defined task of entailment NLI--pointing to interesting avenues for future research.

[1]  C. Peirce,et al.  Pragmatism and pragmaticism , 1934 .

[2]  C. Hartshorne,et al.  Collected Papers of Charles Sanders Peirce , 1935, Nature.

[3]  C. Peirce,et al.  Collected Papers of Charles Sanders Peirce , 1936, Nature.

[4]  H. Andersen ABDUCTIVE AND DEDUCTIVE CHANGE , 1973 .

[5]  R. Schank,et al.  Scripts, plans, and knowledge , 1975, IJCAI 1975.

[6]  Peter Norvig,et al.  Inference in Text Understanding , 1987, AAAI.

[7]  Solomon Eyal Shimony,et al.  Probabilistic Semantics for Cost Based Abduction , 1990, AAAI.

[8]  Jerry R. Hobbs,et al.  Interpretation as Abduction , 1993, Artif. Intell..

[9]  John R. Josephson,et al.  Abductive inference : computation, philosophy, technology , 1994 .

[10]  Gary Shank The Extraordinary Ordinary Powers of Abductive Reasoning , 1998 .

[11]  Judea Pearl,et al.  Reasoning with Cause and Effect , 1999, IJCAI.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  G. Lakoff Linguistics and natural logic , 1970, Synthese.

[14]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[15]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[16]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[17]  Rajat Raina,et al.  Robust Textual Inference Via Learning and Abductive Reasoning , 2005, AAAI.

[18]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[19]  J Quinonero Candela,et al.  Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment , 2006, Lecture Notes in Computer Science.

[20]  Christopher D. Manning,et al.  Natural Logic for Textual Inference , 2007, ACL-PASCAL@ACL.

[21]  Christiane Fellbaum,et al.  On the Role of Lexical and World Knowledge in RTE3 , 2007, ACL-PASCAL@ACL.

[22]  Christopher D. Manning,et al.  An extended model of natural logic , 2009, IWCS.

[23]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[24]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[25]  Alexander Yates,et al.  Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment , 2011, ACL.

[26]  Marie-Francine Moens,et al.  Skip N-grams and Ranking Functions for Predicting Script Events , 2012, EACL.

[27]  Vincent Ng,et al.  Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge , 2012, EMNLP.

[28]  Raymond J. Mooney,et al.  Statistical Script Learning with Multi-Argument Events , 2014, EACL.

[29]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[30]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andrew S. Gordon,et al.  One Hundred Challenge Problems for Logical Formalizations of Commonsense Psychology , 2015, AAAI Spring Symposia.

[32]  Francis Ferraro,et al.  Script Induction as Language Modeling , 2015, EMNLP.

[33]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[34]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[35]  Sheng Zhang,et al.  Ordinal Common-sense Inference , 2016, TACL.

[36]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[37]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[38]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[39]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[40]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[41]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[42]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[43]  Masatoshi Tsuchiya,et al.  Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment , 2018, LREC.

[44]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[45]  J. Pearl,et al.  The Book of Why: The New Science of Cause and Effect , 2018 .

[46]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[47]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[48]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[49]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[50]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[51]  L. Goldberg The Book of Why: The New Science of Cause and Effect† , 2019, Quantitative Finance.

[52]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[53]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[54]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[55]  Ronan Le Bras,et al.  WinoGrande , 2019, AAAI.