Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries

Pretrained language models have been suggested as a possible alternative or complement to structured knowledge bases. However, this emerging LM-as-KB paradigm has so far only been considered in a very limited setting, which only allows handling 21k entities whose single-token name is found in common LM vocabularies. Furthermore, the main benefit of this paradigm, namely querying the KB using a variety of natural language paraphrases, is underexplored so far. Here, we formulate two basic requirements for treating LMs as KBs: (i) the ability to store a large number facts involving a large number of entities and (ii) the ability to query stored facts. We explore three entity representations that allow LMs to represent millions of entities and present a detailed case study on paraphrased querying of world knowledge in LMs, thereby providing a proof-of-concept that language models can indeed serve as knowledge bases.

[1]  Ian S. Dunn,et al.  Exploring the Limits , 2009 .

[2]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[3]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[4]  Rik Van de Walle,et al.  Reasoning over SPARQL , 2013, LDOW.

[5]  Yoshua Bengio,et al.  A Neural Knowledge Language Model , 2016, ArXiv.

[6]  Thomas Muller,et al.  TaPas: Weakly Supervised Table Parsing via Pre-training , 2020, ACL.

[7]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[8]  Wang Ling,et al.  Reference-Aware Language Models , 2016, EMNLP.

[9]  Wenhan Xiong,et al.  Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model , 2019, ICLR.

[10]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[11]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[12]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13]  Satoshi Nakamura,et al.  Neural Machine Translation via Binary Code Prediction , 2017, ACL.

[14]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[15]  Yulia Tsvetkov,et al.  Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs , 2018, ICLR.

[16]  Graham Neubig,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[17]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[18]  Federico López,et al.  Fine-Grained Entity Typing in Hyperbolic Space , 2019, RepL4NLP@ACL.

[19]  Gary G. Hendrix,et al.  Developing a natural language interface to complex data , 1977, TODS.

[20]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[23]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[24]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[25]  Iryna Gurevych,et al.  Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering , 2018, COLING.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Emily M. Bender,et al.  Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[30]  Kentaro Inui,et al.  Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder , 2018, ACL.

[31]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[32]  Nicola De Cao,et al.  KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.

[33]  Dan Klein,et al.  Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks , 2016, NAACL.

[34]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[35]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[36]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[37]  Piek T. J. M. Vossen,et al.  Systematic Study of Long Tail Phenomena in Entity Linking , 2018, COLING.

[38]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40]  Christophe Gravier,et al.  T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples , 2018, LREC.

[41]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[42]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[43]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[44]  Colin Raffel,et al.  How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[45]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[46]  Gerhard Weikum,et al.  Discovering emerging entities with ambiguous names , 2014, WWW.

[47]  Sameer Singh,et al.  Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling , 2019, ACL.

[48]  Hiroyuki Shindo,et al.  Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.

[49]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[50]  Jeffrey Ling,et al.  Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.

[51]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[52]  Jason Baldridge,et al.  Learning Dense Representations for Entity Retrieval , 2019, CoNLL.

[53]  Philip Schlesinger,et al.  Exploring the Limits: Europe’s Changing Communication Environment , 1997 .

[54]  Sameer Singh,et al.  Embedding Multimodal Relational Data for Knowledge Base Completion , 2018, EMNLP.

[55]  Olivier Raiman,et al.  DeepType: Multilingual Entity Linking by Neural Type System Evolution , 2018, AAAI.

[56]  Livio Baldini Soares,et al.  Entities as Experts: Sparse Memory Access with Entity Supervision , 2020, EMNLP.

[57]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[58]  Ulli Waltinger,et al.  BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA , 2019, ArXiv.

[59]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[60]  Roy Schwartz,et al.  Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[61]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[62]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[63]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[64]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[65]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[66]  Thomas Hofmann,et al.  End-to-End Neural Entity Linking , 2018, CoNLL.