Do Language Models Perform Generalizable Commonsense Inference?

Inspired by evidence that pretrained language models (LMs) encode commonsense knowledge, recent work has applied LMs to automatically populate commonsense knowledge graphs (CKGs). However, there is a lack of understanding on their generalization to multiple CKGs, unseen relations, and novel entities. This paper analyzes the ability of LMs to perform generalizable commonsense inference, in terms of knowledge capacity, transferability, and induction. Our experiments with these three aspects show that: (1) LMs can adapt to different schemas defined by multiple CKGs but fail to reuse the knowledge to generalize to new relations. (2) Adapted LMs generalize well to unseen subjects, but less so on novel objects. Future work should investigate how to improve the transferability and induction of commonsense mining from LMs.1

[1]  Yejin Choi,et al.  Dynamic Knowledge Graph Construction for Zero-shot Commonsense Question Answering , 2019, ArXiv.

[2]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[3]  Xiang Li,et al.  Commonsense Knowledge Base Completion , 2016, ACL.

[4]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[5]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Heiko Paulheim,et al.  How much is a Triple? Estimating the Cost of Knowledge Graph Creation , 2018, SEMWEB.

[8]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[9]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[10]  Smaranda Muresan,et al.  Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation , 2020, EMNLP.

[11]  Jackie Chi Kit Cheung,et al.  Commonsense mining as knowledge base completion? A study on the impact of novelty , 2018, ArXiv.

[12]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[15]  Yonatan Bisk,et al.  Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering , 2020, AAAI.

[16]  Wanxiang Che,et al.  Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting , 2020, EMNLP.

[17]  Peter Clark,et al.  Domain-Targeted, High Precision Knowledge Extraction , 2017, TACL.

[18]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[19]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[20]  Alexander M. Rush,et al.  Commonsense Knowledge Mining from Pretrained Models , 2019, EMNLP.

[21]  Erik Cambria,et al.  Augmenting End-to-End Dialogue Systems With Commonsense Knowledge , 2018, AAAI.

[22]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[23]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[24]  Pedro A. Szekely,et al.  Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering , 2020, FINDINGS.

[25]  Yejin Choi,et al.  COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs , 2020, AAAI.

[26]  Mark O. Riedl,et al.  Automated Storytelling via Causal, Commonsense Plot Ordering , 2021, AAAI.

[27]  Yue Zhang,et al.  Commonsense Knowledge Graph Reasoning by Selection or Generation? Why? , 2020, ArXiv.

[28]  Yejin Choi,et al.  Understanding Few-Shot Commonsense Knowledge Models , 2021, ArXiv.

[29]  Dan Roth,et al.  Joint Constrained Learning for Event-Event Relation Extraction , 2020, EMNLP.

[30]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[31]  Danqi Chen,et al.  Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL/IJCNLP.

[32]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.