论文信息 - You can't pick your neighbors, or can you? When and how to rely on retrieval in the kNN-LM - 字舞流文

You can't pick your neighbors, or can you? When and how to rely on retrieval in the kNN-LM

Retrieval-enhanced language models (LMs), which condition their predictions on text retrieved from large external datastores, have recently shown significant perplexity improvements compared to standard LMs. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model and requires no additional training. In this paper, we explore the importance of lexical and semantic matching in the context of items retrieved by $k$NN-LM. We find two trends: (1) the presence of large overlapping $n$-grams between the datastore and evaluation set plays an important factor in strong performance, even when the datastore is derived from the training data; and (2) the $k$NN-LM is most beneficial when retrieved items have high semantic similarity with the query. Based on our analysis, we define a new formulation of the $k$NN-LM that uses retrieval quality to assign the interpolation coefficient. We empirically measure the effectiveness of our approach on two English language modeling datasets, Wikitext-103 and PG-19. Our re-formulation of the $k$NN-LM is beneficial in both cases, and leads to nearly 4% improvement in perplexity on the Wikitext-103 test set.

Andrew Drozdov | A. McCallum | Mohit Iyyer | Hamed Zamani | Shufan Wang | Razieh Rahimi

[1] Danqi Chen,et al. Training Language Models with Memory Augmentation , 2022, EMNLP.

[2] Pedro Henrique Martins,et al. Chunk-based Nearest Neighbor Machine Translation , 2022, EMNLP.

[3] Mohit Iyyer,et al. RankGen: Improving Text Generation with Large Ranking Models , 2022, EMNLP.

[4] Donald Metzler,et al. Retrieval-Enhanced Machine Learning , 2022, SIGIR.

[5] Markus N. Rabe,et al. Memorizing Transformers , 2022, ICLR.

[6] Colin Raffel,et al. Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.

[7] Frank F. Xu,et al. Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval , 2022, ICML.

[8] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[9] Diego de Las Casas,et al. Improving language models by retrieving from trillions of tokens , 2021, ICML.

[10] Asli Celikyilmaz,et al. How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN , 2021, TACL.

[11] Tianwei Zhang,et al. GNN-LM: Language Modeling based on Global Contexts via GNN , 2021, ICLR.

[12] Frank F. Xu,et al. Capturing Structural Locality in Non-parametric Language Models , 2021, ICLR.

[13] Vivek Gupta,et al. RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing , 2021, NLP4CONVAI.

[14] Taylor Berg-Kirkpatrick,et al. Efficient Nearest Neighbor Language Models , 2021, EMNLP.

[15] Nicholas Carlini,et al. Deduplicating Training Data Makes Language Models Better , 2021, ACL.

[16] Jiajun Chen,et al. Adaptive Nearest Neighbor Machine Translation , 2021, ACL.

[17] Ming-Wei Chang,et al. Joint Passage Ranking for Diverse Multi-Answer Retrieval , 2021, EMNLP.

[18] Luyu Gao,et al. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List , 2021, NAACL.

[19] Dani Yogatama,et al. Adaptive Semiparametric Language Models , 2021, Transactions of the Association for Computational Linguistics.

[20] Vishrav Chaudhary,et al. Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[21] Mike Lewis,et al. Nearest Neighbor Machine Translation , 2020, ICLR.

[22] W. Bruce Croft,et al. Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search , 2020, SIGIR.

[23] Graham Neubig,et al. Learning Sparse Prototypes for Text Generation , 2020, NeurIPS.

[24] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[25] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[26] Hinrich Schütze,et al. BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA , 2020, FINDINGS.

[27] Claire Gardent,et al. Augmenting Transformers with KNN-Based Composite Memory for Dialog , 2020, TACL.

[28] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[29] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.

[30] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2019, ICLR.

[31] Rajarshi Das,et al. Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering , 2019, EMNLP.

[32] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[33] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[34] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.

[35] Laure Thompson,et al. Quantifying the Effects of Text Duplication on Semantic Models , 2017, EMNLP.

[36] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[37] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[38] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[39] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.

[40] Robert Tibshirani,et al. Discriminant Adaptive Nearest Neighbor Classification and Regression , 1995, NIPS.

[41] Thomas G. Dietterich,et al. Locally Adaptive Nearest Neighbor Algorithms , 1993, NIPS.