Language Models are Open Knowledge Graphs

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision. Popular KGs (e.g, Wikidata, NELL) are built in either a supervised or semi-supervised manner, requiring humans to create knowledge. Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training. The stored knowledge has enabled the language models to improve downstream NLP tasks, e.g., answering questions, and writing code and articles. In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs. We show that KGs are constructed with a single forward pass of the pre-trained language models (without fine-tuning) over the corpora. We demonstrate the quality of the constructed KGs by comparing to two KGs (Wikidata, TAC KBP) created by humans. Our KGs also provide open factual knowledge that is new in the existing KGs. Our code and KGs will be made publicly available.

[1]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[2]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[3]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[4]  Omer Levy,et al.  Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.

[5]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[6]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[7]  Fabio Petroni,et al.  How Context Affects Language Models' Factual Predictions , 2020, AKBC.

[8]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[9]  Thomas Hofmann,et al.  End-to-End Neural Entity Linking , 2018, CoNLL.

[10]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[11]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[12]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[13]  Mohammad Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[14]  Christopher Ré,et al.  Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..

[15]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[16]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[19]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[22]  Roy Schwartz,et al.  Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[23]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[24]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[25]  Omer Levy,et al.  Are Sixteen Heads Really Better than One? , 2019, NeurIPS.

[26]  Geir Kjetil Sandve,et al.  Hopfield Networks is All You Need , 2020, ArXiv.

[27]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[28]  Jesse Vig,et al.  Visualizing Attention in Transformer-Based Language Representation Models , 2019, ArXiv.

[29]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[30]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[31]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[32]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[33]  Lav R. Varshney,et al.  BERTology Meets Biology: Interpreting Attention in Protein Language Models , 2020, bioRxiv.

[34]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[35]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[36]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[37]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[38]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[39]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[40]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[41]  Mihai Surdeanu Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling and Temporal Slot Filling , 2013, TAC.

[42]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[43]  Valentin I. Spitkovsky,et al.  A Cross-Lingual Dictionary for English Wikipedia Concepts , 2012, LREC.

[44]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[45]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[46]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[47]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[49]  Graham Neubig,et al.  Differentiable Reasoning over a Virtual Knowledge Base , 2020, ICLR.

[50]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[51]  Dawn Song,et al.  Measuring Massive Multitask Language Understanding , 2020, ICLR.