Autoregressive Entity Retrieval

Entities are at the center of how we represent and aggregate knowledge. For instance, Encyclopedias such as Wikipedia are structured by entities (e.g., one per article). The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering. One way to understand current approaches is as classifiers among atomic labels, one for each entity. Their weight vectors are dense entity representations produced by encoding entity information such as descriptions. This approach leads to several shortcomings: i) context and entity affinity is mainly captured through a vector dot product, potentially missing fine-grained interactions between the two; ii) a large memory footprint is needed to store dense representations when considering large entity sets; iii) an appropriately hard set of negative data has to be subsampled at training time. We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion, and conditioned on the context. This enables to mitigate the aforementioned technical issues: i) the autoregressive formulation allows us to directly capture relations between context and entity name, effectively cross encoding both; ii) the memory footprint is greatly reduced because the parameters of our encoder-decoder architecture scale with vocabulary size, not entity count; iii) the exact softmax loss can be efficiently computed without the need to subsample negative data. We show the efficacy of the approach with more than 20 datasets on entity disambiguation, end-to-end entity linking and document retrieval tasks, achieving new SOTA, or very competitive results while using a tiny fraction of the memory of competing systems. Finally, we demonstrate that new entities can be added by simply specifying their unambiguous name.

[1]  Ophir Frieder,et al.  Average R-Precision , 2009, Encyclopedia of Database Systems.

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Nicola De Cao,et al.  KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.

[4]  Zhaochen Guo,et al.  Robust named entity disambiguation with random walks , 2018, Semantic Web.

[5]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[6]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[7]  Fabrizio Silvestri,et al.  How Decoding Strategies Affect the Verifiability of Generated Text , 2020, FINDINGS.

[8]  Mirella Lapata,et al.  Discourse Representation Structure Parsing , 2018, ACL.

[9]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[10]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[11]  Jason Weston,et al.  ELI5: Long Form Question Answering , 2019, ACL.

[12]  Yueting Zhuang,et al.  Learning Dynamic Context Augmentation for Global Entity Linking , 2019, EMNLP.

[13]  Harald Sack,et al.  Semantic Multimedia Information Retrieval Based on Contextual Descriptions , 2013, ESWC.

[14]  Yasumasa Onoe,et al.  Fine-Grained Entity Typing for Domain Independent Entity Linking , 2020, AAAI.

[15]  Ivan Titov,et al.  Boosting Entity Linking Performance by Leveraging Unlabeled Documents , 2019, ACL.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Fabio Petroni,et al.  How Context Affects Language Models' Factual Predictions , 2020, AKBC.

[18]  Krisztian Balog,et al.  REL: An Entity Linker Standing on the Shoulders of Giants , 2020, SIGIR.

[19]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[20]  Roland Vollgraf,et al.  FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP , 2019, NAACL.

[21]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[22]  Paolo Ferragina,et al.  From TagME to WAT: a new entity annotator , 2014, ERD '14.

[23]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[24]  Basura Fernando,et al.  Guided Open Vocabulary Image Captioning with Constrained Beam Search , 2016, EMNLP.

[25]  Zita Marinho,et al.  Joint Learning of Named Entity Recognition and Entity Linking , 2019, ACL.

[26]  Yanan Cao,et al.  Joint Entity Linking with Deep Reinforcement Learning , 2019, WWW.

[27]  Axel-Cyrille Ngonga Ngomo,et al.  GERBIL - Benchmarking Named Entity Recognition and Linking consistently , 2017, Semantic Web.

[28]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[29]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[30]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[31]  Yejin Choi,et al.  Neural AMR: Sequence-to-Sequence Models for Parsing and Generation , 2017, ACL.

[32]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[33]  Jason Weston,et al.  Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2019 .

[34]  Thomas Hofmann,et al.  Deep Joint Entity Disambiguation with Local Neural Attention , 2017, EMNLP.

[35]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[38]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[39]  Jason Weston,et al.  Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers , 2019, ArXiv.

[40]  Ivan Titov,et al.  Improving Entity Linking by Modeling Latent Relations between Mentions , 2018, ACL.

[41]  Anette Frank,et al.  A Sequence-to-Sequence Model for Semantic Role Labeling , 2018, Rep4NLP@ACL.

[42]  Sebastian Hellmann,et al.  N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format , 2014, LREC.

[43]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[44]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[45]  Christophe Gravier,et al.  T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples , 2018, LREC.

[46]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[47]  Ming-Wei Chang,et al.  Zero-Shot Entity Linking by Reading Entity Descriptions , 2019, ACL.

[48]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[49]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[50]  Gerhard Weikum,et al.  KORE: keyphrase overlap relatedness for entity disambiguation , 2012, CIKM.

[51]  Luke Zettlemoyer,et al.  Zero-shot Entity Linking with Dense Entity Retrieval , 2019, ArXiv.

[52]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[53]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[54]  Xiaoli Z. Fern,et al.  Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation , 2019, ArXiv.

[55]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[56]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[57]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[58]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[59]  Andrea Giovanni Nuzzolese,et al.  Open Knowledge Extraction Challenge , 2015, SemWebEval@ESWC.

[60]  Samuel Broscheit,et al.  Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking , 2019, CoNLL.

[61]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[62]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[63]  Emilio Monti,et al.  Don’t Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing , 2020, WWW.

[64]  Kartikeya Upasani,et al.  Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue , 2019, ACL.

[65]  Thomas Hofmann,et al.  End-to-End Neural Entity Linking , 2018, CoNLL.

[66]  Jimmy J. Lin,et al.  Document Ranking with a Pretrained Sequence-to-Sequence Model , 2020, FINDINGS.

[67]  Yi Yang,et al.  Collective Entity Disambiguation with Structured Gradient Tree Boosting , 2018, NAACL.

[68]  Hiroyuki Shindo,et al.  Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.

[69]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[70]  Larry P. Heck,et al.  Leveraging Deep Neural Networks and Knowledge Graphs for Entity Disambiguation , 2015, ArXiv.