Unsupervised Commonsense Question Answering with Self-Talk

Natural language understanding involves reading between the lines with implicit background knowledge. Current systems either rely on pre-trained language models as the sole implicit source of world knowledge, or resort to external knowledge bases (KBs) to incorporate additional relevant knowledge. We propose an unsupervised framework based on \emph{self-talk} as a novel alternative to multiple-choice commonsense tasks. Inspired by inquiry-based discovery learning (Bruner, 1961), our approach inquires language models with a number of information seeking questions such as "$\textit{what is the definition of ...}$" to discover additional background knowledge. Empirical results demonstrate that the self-talk procedure substantially improves the performance of zero-shot language model baselines on four out of six commonsense benchmarks, and competes with models that obtain knowledge from external KBs. While our approach improves performance on several benchmarks, the self-talk induced knowledge even when leading to correct answers is not always seen as useful by human judges, raising interesting questions about the inner-workings of pre-trained language models for commonsense reasoning.

[1]  Yue Zhang,et al.  Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation , 2019, ACL.

[2]  Wenhan Xiong,et al.  Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader , 2019, ACL.

[3]  Ming Yan,et al.  Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning , 2019, CIKM.

[4]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[5]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[6]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[7]  Ming-Feng Tsai,et al.  TTTTTackling WinoGrande Schemas , 2020, ArXiv.

[8]  Benjamin Van Durme,et al.  Reporting bias and knowledge acquisition , 2013, AKBC '13.

[9]  Kenton Lee,et al.  Contextualized Representations Using Textual Encyclopedic Knowledge , 2020, ArXiv.

[10]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[11]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[12]  Ramakanth Pasunuru,et al.  Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation , 2018, ACL.

[13]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[14]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[15]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[16]  Guillaume Bouchard,et al.  Interpretation of Natural Language Rules in Conversational Machine Reading , 2018, EMNLP.

[17]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[18]  Bhuwan Dhingra,et al.  Simple and Effective Semi-Supervised Question Answering , 2018, NAACL.

[19]  Hal Daumé,et al.  Answer-based Adversarial Training for Generating Clarification Questions , 2019, NAACL.

[20]  Xinya Du,et al.  Learning to Ask: Neural Question Generation for Reading Comprehension , 2017, ACL.

[21]  Ido Dagan,et al.  Path-based vs. Distributional Information in Recognizing Lexical Semantic Relations , 2016, CogALex@COLING.

[22]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[23]  Chitta Baral,et al.  How Additional Knowledge can Improve Natural Language Commonsense Question Answering , 2020 .

[24]  Eric P. Xing,et al.  Self-Training for Jointly Learning to Ask and Answer Questions , 2018, NAACL.

[25]  Catherine Havasi,et al.  Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.

[26]  Jonathan Berant,et al.  Explaining Question Answering Models through Text Generation , 2020, ArXiv.

[27]  Tassilo Klein,et al.  Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds , 2019, ArXiv.

[28]  Xinya Du,et al.  Harvesting Paragraph-level Question-Answer Pairs from Wikipedia , 2018, ACL.

[29]  Jimmy Lin,et al.  Adaptive Pruning of Neural Language Models for Mobile Devices , 2018, ArXiv.

[30]  Ido Dagan,et al.  Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations , 2018, ACL.

[31]  Xiang Ren,et al.  KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning , 2019, EMNLP.

[32]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[33]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[34]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[35]  Hinrich Schutze,et al.  Negated LAMA: Birds cannot fly , 2019, ArXiv.

[36]  Yejin Choi,et al.  Social IQA: Commonsense Reasoning about Social Interactions , 2019, EMNLP 2019.

[37]  Alexander M. Rush,et al.  Commonsense Knowledge Mining from Pretrained Models , 2019, EMNLP.

[38]  J. Bruner The act of discovery. , 1961 .

[39]  Sanja Fidler,et al.  Learning to Caption Images Through a Lifetime by Asking Questions , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Lucy Vanderwende The Importance of Being Important: Question Generation , 2008 .

[41]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[42]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[43]  Chao Wang,et al.  Explicit Utilization of General Knowledge in Machine Reading Comprehension , 2018, ACL.

[44]  Dan Roth,et al.  “Going on a vacation” takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding , 2019, EMNLP.

[45]  Mohit Bansal,et al.  Commonsense for Generative Multi-Hop Question Answering Tasks , 2018, EMNLP.

[46]  Hongyu Lin,et al.  Reasoning with Heterogeneous Knowledge for Commonsense Machine Comprehension , 2017, EMNLP.

[47]  Bhavana Dalvi,et al.  Reasoning about Actions and State Changes by Injecting Commonsense Knowledge , 2018, EMNLP.

[48]  Kyunghyun Cho,et al.  Unsupervised Question Decomposition for Question Answering , 2020, EMNLP.

[49]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[50]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[51]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[52]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[53]  Simon Ostermann,et al.  SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge , 2018, *SEMEVAL.

[54]  Sameer Singh,et al.  Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling , 2019, ACL.

[55]  Chris Dyer,et al.  The NarrativeQA Reading Comprehension Challenge , 2017, TACL.

[56]  Peter Potash,et al.  Playing log(N)-Questions over Sentences , 2019, ArXiv.

[57]  Deng Cai,et al.  Reinforced Dynamic Reasoning for Conversational Question Generation , 2019, ACL.

[58]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[59]  Yejin Choi,et al.  PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[60]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[61]  Ramesh Nallapati,et al.  Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering , 2020, ACL.

[62]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[63]  Zhou Yu,et al.  Incorporating Structured Commonsense Knowledge in Story Completion , 2018, AAAI.

[64]  Jannis Bulian,et al.  Ask the Right Questions: Active Question Reformulation with Reinforcement Learning , 2017, ICLR.

[65]  Yejin Choi,et al.  WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale , 2020, AAAI.

[66]  Yejin Choi,et al.  Dynamic Knowledge Graph Construction for Zero-shot Commonsense Question Answering , 2019, ArXiv.

[67]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[68]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[69]  Anette Frank,et al.  Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs , 2019, NAACL.

[70]  Lynette Hirschman,et al.  Deep Read: A Reading Comprehension System , 1999, ACL.

[71]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.