论文信息 - Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries - 字舞流文

Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries

Pretrained language models have been suggested as a possible alternative or complement to structured knowledge bases. However, this emerging LM-as-KB paradigm has so far only been considered in a very limited setting, which only allows handling 21k entities whose single-token name is found in common LM vocabularies. Furthermore, the main benefit of this paradigm, namely querying the KB using a variety of natural language paraphrases, is underexplored so far. Here, we formulate two basic requirements for treating LMs as KBs: (i) the ability to store a large number facts involving a large number of entities and (ii) the ability to query stored facts. We explore three entity representations that allow LMs to represent millions of entities and present a detailed case study on paraphrased querying of world knowledge in LMs, thereby providing a proof-of-concept that language models can indeed serve as knowledge bases.

Kentaro Inui | Benjamin Heinzerling | Kentaro Inui | Benjamin Heinzerling

[1] Ian S. Dunn,et al. Exploring the Limits , 2009 .

[2] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.

[3] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[4] Rik Van de Walle,et al. Reasoning over SPARQL , 2013, LDOW.

[5] Yoshua Bengio,et al. A Neural Knowledge Language Model , 2016, ArXiv.

[6] Thomas Muller,et al. TaPas: Weakly Supervised Table Parsing via Pre-training , 2020, ACL.

[7] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[8] Wang Ling,et al. Reference-Aware Language Models , 2016, EMNLP.

[9] Wenhan Xiong,et al. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model , 2019, ICLR.

[10] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[11] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[12] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13] Satoshi Nakamura,et al. Neural Machine Translation via Binary Code Prediction , 2017, ACL.

[14] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.

[15] Yulia Tsvetkov,et al. Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs , 2018, ICLR.

[16] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[17] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[18] Federico López,et al. Fine-Grained Entity Typing in Hyperbolic Space , 2019, RepL4NLP@ACL.

[19] Gary G. Hendrix,et al. Developing a natural language interface to complex data , 1977, TODS.

[20] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[21] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[22] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[23] Gerhard Weikum,et al. Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[24] Leon Derczynski,et al. Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[25] Iryna Gurevych,et al. Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering , 2018, COLING.

[26] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[27] Emily M. Bender,et al. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.

[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[29] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[30] Kentaro Inui,et al. Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder , 2018, ACL.

[31] Pasquale Minervini,et al. Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[32] Nicola De Cao,et al. KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.

[33] Dan Klein,et al. Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks , 2016, NAACL.

[34] Gerhard Weikum,et al. YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[35] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.

[36] Hans-Peter Kriegel,et al. A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[37] Piek T. J. M. Vossen,et al. Systematic Study of Long Tail Phenomena in Entity Linking , 2018, COLING.

[38] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[39] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40] Christophe Gravier,et al. T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples , 2018, LREC.

[41] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[42] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.

[43] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[44] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[45] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[46] Gerhard Weikum,et al. Discovering emerging entities with ambiguous names , 2014, WWW.

[47] Sameer Singh,et al. Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling , 2019, ACL.

[48] Hiroyuki Shindo,et al. Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.

[49] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[50] Jeffrey Ling,et al. Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.

[51] Guillaume Bouchard,et al. Complex Embeddings for Simple Link Prediction , 2016, ICML.

[52] Jason Baldridge,et al. Learning Dense Representations for Entity Retrieval , 2019, CoNLL.

[53] Philip Schlesinger,et al. Exploring the Limits: Europe’s Changing Communication Environment , 1997 .

[54] Sameer Singh,et al. Embedding Multimodal Relational Data for Knowledge Base Completion , 2018, EMNLP.

[55] Olivier Raiman,et al. DeepType: Multilingual Entity Linking by Neural Type System Evolution , 2018, AAAI.

[56] Livio Baldini Soares,et al. Entities as Experts: Sparse Memory Access with Entity Supervision , 2020, EMNLP.

[57] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[58] Ulli Waltinger,et al. BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA , 2019, ArXiv.

[59] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[60] Roy Schwartz,et al. Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[61] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[62] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[63] Andrew McCallum,et al. Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[64] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .

[65] Markus Krötzsch,et al. Wikidata , 2014, Commun. ACM.

[66] Thomas Hofmann,et al. End-to-End Neural Entity Linking , 2018, CoNLL.