You can't pick your neighbors, or can you? When and how to rely on retrieval in the kNN-LM

Retrieval-enhanced language models (LMs), which condition their predictions on text retrieved from large external datastores, have recently shown significant perplexity improvements compared to standard LMs. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model and requires no additional training. In this paper, we explore the importance of lexical and semantic matching in the context of items retrieved by $k$NN-LM. We find two trends: (1) the presence of large overlapping $n$-grams between the datastore and evaluation set plays an important factor in strong performance, even when the datastore is derived from the training data; and (2) the $k$NN-LM is most beneficial when retrieved items have high semantic similarity with the query. Based on our analysis, we define a new formulation of the $k$NN-LM that uses retrieval quality to assign the interpolation coefficient. We empirically measure the effectiveness of our approach on two English language modeling datasets, Wikitext-103 and PG-19. Our re-formulation of the $k$NN-LM is beneficial in both cases, and leads to nearly 4% improvement in perplexity on the Wikitext-103 test set.

[1]  Danqi Chen,et al.  Training Language Models with Memory Augmentation , 2022, EMNLP.

[2]  Pedro Henrique Martins,et al.  Chunk-based Nearest Neighbor Machine Translation , 2022, EMNLP.

[3]  Mohit Iyyer,et al.  RankGen: Improving Text Generation with Large Ranking Models , 2022, EMNLP.

[4]  Donald Metzler,et al.  Retrieval-Enhanced Machine Learning , 2022, SIGIR.

[5]  Markus N. Rabe,et al.  Memorizing Transformers , 2022, ICLR.

[6]  Colin Raffel,et al.  Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.

[7]  Frank F. Xu,et al.  Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval , 2022, ICML.

[8]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[9]  Diego de Las Casas,et al.  Improving language models by retrieving from trillions of tokens , 2021, ICML.

[10]  Asli Celikyilmaz,et al.  How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN , 2021, TACL.

[11]  Tianwei Zhang,et al.  GNN-LM: Language Modeling based on Global Contexts via GNN , 2021, ICLR.

[12]  Frank F. Xu,et al.  Capturing Structural Locality in Non-parametric Language Models , 2021, ICLR.

[13]  Vivek Gupta,et al.  RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing , 2021, NLP4CONVAI.

[14]  Taylor Berg-Kirkpatrick,et al.  Efficient Nearest Neighbor Language Models , 2021, EMNLP.

[15]  Nicholas Carlini,et al.  Deduplicating Training Data Makes Language Models Better , 2021, ACL.

[16]  Jiajun Chen,et al.  Adaptive Nearest Neighbor Machine Translation , 2021, ACL.

[17]  Ming-Wei Chang,et al.  Joint Passage Ranking for Diverse Multi-Answer Retrieval , 2021, EMNLP.

[18]  Luyu Gao,et al.  COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List , 2021, NAACL.

[19]  Dani Yogatama,et al.  Adaptive Semiparametric Language Models , 2021, Transactions of the Association for Computational Linguistics.

[20]  Vishrav Chaudhary,et al.  Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[21]  Mike Lewis,et al.  Nearest Neighbor Machine Translation , 2020, ICLR.

[22]  W. Bruce Croft,et al.  Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search , 2020, SIGIR.

[23]  Graham Neubig,et al.  Learning Sparse Prototypes for Text Generation , 2020, NeurIPS.

[24]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[25]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[26]  Hinrich Schütze,et al.  BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA , 2020, FINDINGS.

[27]  Claire Gardent,et al.  Augmenting Transformers with KNN-Based Composite Memory for Dialog , 2020, TACL.

[28]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[29]  Timothy P. Lillicrap,et al.  Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.

[30]  Omer Levy,et al.  Generalization through Memorization: Nearest Neighbor Language Models , 2019, ICLR.

[31]  Rajarshi Das,et al.  Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering , 2019, EMNLP.

[32]  M. Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[33]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[34]  Alexei Baevski,et al.  Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.

[35]  Laure Thompson,et al.  Quantifying the Effects of Text Duplication on Semantic Models , 2017, EMNLP.

[36]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[37]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[38]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[39]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[40]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification and Regression , 1995, NIPS.

[41]  Thomas G. Dietterich,et al.  Locally Adaptive Nearest Neighbor Algorithms , 1993, NIPS.