COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and ConceptNet (Speer et al., 2017). Contrary to many conventional KBs that store knowledge with canonical templates, commonsense KBs only store loosely structured open-text descriptions of knowledge. We posit that an important step toward automatic commonsense completion is the development of generative models of commonsense knowledge, and propose COMmonsEnse Transformers (COMET) that learn to generate rich and diverse commonsense descriptions in natural language. Despite the challenges of commonsense modeling, our investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs. Empirical results demonstrate that COMET is able to generate novel knowledge that humans rate as high quality, with up to 77.5% (ATOMIC) and 91.7% (ConceptNet) precision at top 1, which approaches human performance for these resources. Our findings suggest that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[3]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[4]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[7]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[10]  Benjamin Van Durme,et al.  Reporting bias and knowledge acquisition , 2013, AKBC '13.

[11]  Xiang Li,et al.  Commonsense Knowledge Base Completion , 2016, ACL.

[12]  Christopher Ré,et al.  Web-scale knowledge-base construction via statistical inference and learning , 2012 .

[13]  Ming Zhou,et al.  Neural Open Information Extraction , 2018, ACL.

[14]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[17]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[20]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[21]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[22]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[23]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[24]  Junji Tomita,et al.  Commonsense Knowledge Base Completion and Generation , 2018, CoNLL.

[25]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[26]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[27]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[28]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[29]  Aditya Kalyanpur,et al.  PRISMATIC: Inducing Knowledge from a Large Scale Lexicalized Relation Resource , 2010, HLT-NAACL 2010.

[30]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[31]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[32]  J. Treble,et al.  Computer Intensive Methods for Hypothesis Testing , 1990 .

[33]  Kevin Gimpel,et al.  Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[34]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[35]  Oren Etzioni,et al.  Adapting Open Information Extraction to Domain-Specific Relations , 2010, AI Mag..

[36]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.