COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

Recent years have brought about a renewed interest in commonsense representation and reasoning in the field of natural language understanding. The development of new commonsense knowledge graphs (CSKG) has been central to these advances as their diverse facts can be used and referenced by machine learning models for tackling new and challenging tasks. At the same time, there remain questions about the quality and coverage of these resources due to the massive scale required to comprehensively encompass general commonsense knowledge. In this work, we posit that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents. Therefore, we propose a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them. With this new goal, we propose ATOMIC 2020, a new CSKG of general-purpose commonsense knowledge containing knowledge that is not readily available in pretrained language models. We evaluate its properties in comparison with other leading CSKGs, performing the first large-scale pairwise study of commonsense knowledge resources. Next, we show that ATOMIC 2020 is better suited for training knowledge models that can generate accurate, representative knowledge for new, unseen entities and events. Finally, through human evaluation, we show that the few-shot performance of GPT-3 (175B parameters), while impressive, remains ~12 absolute points lower than a BART-based knowledge model trained on ATOMIC 2020 despite using over 430x fewer parameters.

[1]  Jun Yan,et al.  Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering , 2020, EMNLP.

[2]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[3]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[4]  Dong Si,et al.  A Wizard-of-Oz Interface and Persona-based Methodology for Collecting Health Counseling Dialog , 2020, CHI Extended Abstracts.

[5]  Xiang Ren,et al.  KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning , 2019, EMNLP.

[6]  Mark O. Riedl,et al.  Automated Storytelling via Causal, Commonsense Plot Ordering , 2020, AAAI.

[7]  B. Landau,et al.  Spatial language and spatial cognition , 2013 .

[8]  Debanjan Ghosh,et al.  R3: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge , 2020, ACL.

[9]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[10]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[11]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[12]  Alexander M. Rush,et al.  Commonsense Knowledge Mining from Pretrained Models , 2019, EMNLP.

[13]  Leonard Talmy,et al.  Force Dynamics in Language and Cognition , 1987, Cogn. Sci..

[14]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[15]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[16]  Alex Lascarides,et al.  Discourse Relations and Defeasible Knowledge , 1991, ACL.

[17]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[18]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[19]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[20]  Annette Herskovits Language and Spatial Cognition: An Interdisciplinary Study of the Prepositions in English , 2009 .

[21]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[22]  Benjamin Van Durme,et al.  Reporting bias and knowledge acquisition , 2013, AKBC '13.

[23]  Xiang Li,et al.  Commonsense Knowledge Base Completion , 2016, ACL.

[24]  Ernest Davis,et al.  Commonsense reasoning and commonsense knowledge in artificial intelligence , 2015, Commun. ACM.

[25]  Dan Roth,et al.  TransOMCS: From Linguistic Graphs to Commonsense Knowledge , 2020, IJCAI.

[26]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yejin Choi,et al.  Unsupervised Commonsense Question Answering with Self-Talk , 2020, EMNLP.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Philip S. Yu,et al.  Commonsense Evidence Generation and Injection in Reading Comprehension , 2020, SIGDIAL.

[30]  Ray Jackendoff,et al.  Spatial Language and Spatial Cognition , 2013 .

[31]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[32]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[33]  Nicola Pellicano,et al.  Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning , 2020, ACL.

[34]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[35]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[36]  Colin Raffel,et al.  How Much Knowledge Can You Pack Into the Parameters of a Language Model? , 2020, EMNLP.

[37]  Yejin Choi,et al.  PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[38]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[39]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[40]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[41]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[42]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.