GLUCOSE: GeneraLized and COntextualized Story Explanations

When humans read or listen, they make implicit commonsense inferences that frame their understanding of what happened and why. As a step toward AI systems that can build similar mental models, we introduce GLUCOSE, a large-scale dataset of implicit commonsense causal knowledge, encoded as causal mini-theories about the world, each grounded in a narrative context. To construct GLUCOSE, we drew on cognitive psychology to identify ten dimensions of causal explanation, focusing on events, states, motivations, and emotions. Each GLUCOSE entry includes a story-specific causal statement paired with an inference rule generalized from the statement. This paper details two concrete contributions: First, we present our platform for effectively crowdsourcing GLUCOSE data at scale, which uses semi-structured templates to elicit causal explanations. Using this platform, we collected 440K specific statements and general rules that capture implicit commonsense knowledge about everyday situations. Second, we show that existing knowledge resources and pretrained language models do not include or readily predict GLUCOSE's rich inferential content. However, when state-of-the-art neural models are trained on this knowledge, they can start to make commonsense inferences on unseen stories that match humans' mental models.

[1]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[2]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[3]  Rolf A. Zwaan,et al.  The Construction of Situation Models in Narrative Comprehension: An Event-Indexing Model , 1995 .

[4]  Rolf A. Zwaan,et al.  Situation models in language comprehension and memory. , 1998, Psychological bulletin.

[5]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[6]  T. Lombrozo The structure and function of explanations , 2006, Trends in Cognitive Sciences.

[7]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Event Chains , 2008, ACL.

[8]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[9]  M. Brysbaert,et al.  Age-of-acquisition ratings for 30,000 English words , 2012, Behavior research methods.

[10]  Andreas Meisel,et al.  Sugar for the brain: the role of glucose in physiological and pathological brain function , 2013, Trends in Neurosciences.

[11]  Benjamin Van Durme,et al.  Reporting bias and knowledge acquisition , 2013, AKBC '13.

[12]  Oren Etzioni,et al.  Generating Coherent Event Schemas at Scale , 2013, EMNLP.

[13]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[14]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[15]  Zhifang Sui,et al.  Joint Learning Templates and Slots for Event Schema Induction , 2016, NAACL.

[16]  Verena Rieser,et al.  Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[19]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[20]  Alessandro Pepe,et al.  The Relation Between Emotion Understanding and Theory of Mind in Children Aged 3 to 8: The Key Role of Language , 2018, Front. Psychol..

[21]  Yejin Choi,et al.  Modeling Naive Psychology of Characters in Simple Commonsense Stories , 2018, ACL.

[22]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[23]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[24]  James Allen,et al.  Tackling the Story Ending Biases in The Story Cloze Test , 2018, ACL.

[25]  Yejin Choi,et al.  Event2Mind: Commonsense Inference on Events, Intents, and Reactions , 2018, ACL.

[26]  Cristine H. Legare,et al.  Explanation Scaffolds Causal Learning and Problem Solving in Childhood , 2018 .

[27]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[28]  Shaohua Yang,et al.  Commonsense Justification for Action Explanation , 2018, EMNLP.

[29]  Yonatan Belinkov,et al.  On Adversarial Removal of Hypothesis-only Bias in Natural Language Inference , 2019, *SEMEVAL.

[30]  Percy Liang,et al.  Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.

[31]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[32]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[33]  Peter Clark,et al.  WIQA: A dataset for “What if...” reasoning over procedural text , 2019, EMNLP.

[34]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Steven Schockaert,et al.  Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness , 2019, WWW.

[37]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[38]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[39]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..