Large Language Models for Information Retrieval: A Survey

As a primary means of information acquisition, information retrieval (IR) systems, such as search engines, have integrated themselves into our daily lives. These systems also serve as components of dialogue, question-answering, and recommender systems. The trajectory of IR has evolved dynamically from its origins in term-based methods to its integration with advanced neural models. While the neural models excel at capturing complex contextual signals and semantic nuances, thereby reshaping the IR landscape, they still face challenges such as data scarcity, interpretability, and the generation of contextually plausible yet potentially inaccurate responses. This evolution requires a combination of both traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding capacity). Meanwhile, the emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has revolutionized natural language processing due to their remarkable language understanding, generation, generalization, and reasoning abilities. Consequently, recent research has sought to leverage LLMs to improve IR systems. Given the rapid evolution of this research trajectory, it is necessary to consolidate existing methodologies and provide nuanced insights through a comprehensive overview. In this survey, we delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers. Additionally, we explore promising directions within this expanding field.

[1]  Wayne Xin Zhao,et al.  Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation , 2023, ArXiv.

[2]  M. Zhang,et al.  Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community , 2023, AI Open.

[3]  M. Sanderson,et al.  Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study , 2023, SIGIR.

[4]  Zhiyuan Peng,et al.  Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models , 2023, ArXiv.

[5]  Jiliang Tang,et al.  Recommender Systems in the Era of Large Language Models (LLMs) , 2023, IEEE Transactions on Knowledge and Data Engineering.

[6]  Donald Metzler,et al.  Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting , 2023, ArXiv.

[7]  Weinan Zhang,et al.  Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models , 2023, ArXiv.

[8]  Jeffrey Stephen Dalton,et al.  GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval , 2023, ArXiv.

[9]  Jiliang Tang,et al.  Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective , 2023, IEEE Transactions on Knowledge and Data Engineering.

[10]  Zhicheng Dou,et al.  RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit , 2023, ArXiv.

[11]  Nan Duan,et al.  Query Rewriting for Retrieval-Augmented Large Language Models , 2023, ArXiv.

[12]  Luke Zettlemoyer,et al.  QLoRA: Efficient Finetuning of Quantized LLMs , 2023, NeurIPS.

[13]  Soyeong Jeong,et al.  Discrete Prompt Optimization via Constrained Generation for Zero-shot Re-ranker , 2023, ACL.

[14]  Wayne Xin Zhao,et al.  HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models , 2023, ArXiv.

[15]  W. Yu,et al.  Large Language Models are Built-in Autoregressive Search Engines , 2023, ACL.

[16]  Wayne Xin Zhao,et al.  Large Language Models are Zero-Shot Rankers for Recommender Systems , 2023, ArXiv.

[17]  Tao Shen,et al.  Knowledge Refinement via Interaction Between Search Engines and Large Language Models , 2023, ArXiv.

[18]  Jeffrey Stephen Dalton,et al.  Generative and Pseudo-Relevant Feedback for Sparse, Dense and Learned Sparse Retrieval , 2023, ArXiv.

[19]  Frank F. Xu,et al.  Active Retrieval Augmented Generation , 2023, ArXiv.

[20]  Wayne Xin Zhao,et al.  Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach , 2023, ArXiv.

[21]  Maosong Sun,et al.  WebCPM: Interactive Web Search for Chinese Long-form Question Answering , 2023, ACL.

[22]  Xuanhui Wang,et al.  Query Expansion by Prompting Large Language Models , 2023, ArXiv.

[23]  E. Kanoulas,et al.  Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts , 2023, ArXiv.

[24]  Ronak Pradeep,et al.  Zero-Shot Listwise Document Reranking with a Large Language Model , 2023, ArXiv.

[25]  Tao Shen,et al.  Large Language Models are Strong Zero-Shot Retriever , 2023, ArXiv.

[26]  Jonathan Berant,et al.  Answering Questions by Meta-Reasoning over Multiple Chains of Thought , 2023, ArXiv.

[27]  Z. Ren,et al.  Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent , 2023, ArXiv.

[28]  Zhicheng Dou,et al.  WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus , 2023, ArXiv.

[29]  Michael S. Bernstein,et al.  Generative Agents: Interactive Simulacra of Human Behavior , 2023, UIST.

[30]  Wayne Xin Zhao,et al.  A Survey of Large Language Models , 2023, ArXiv.

[31]  P. Kambadur,et al.  BloombergGPT: A Large Language Model for Finance , 2023, ArXiv.

[32]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[33]  M. Gales,et al.  SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models , 2023, ArXiv.

[34]  Furu Wei,et al.  Query2doc: Query Expansion with Large Language Models , 2023, ArXiv.

[35]  Zhicheng Dou,et al.  Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search , 2023, ArXiv.

[36]  Philip S. Yu,et al.  A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT , 2023, ArXiv.

[37]  Md Arafat Sultan,et al.  UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers , 2023, ArXiv.

[38]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[39]  Y. Shoham,et al.  In-Context Retrieval-Augmented Language Models , 2023, Transactions of the Association for Computational Linguistics.

[40]  M. Lewis,et al.  REPLUG: Retrieval-Augmented Black-Box Language Models , 2023, NAACL.

[41]  Rodrigo Nogueira,et al.  ExaRanker: Explanation-Augmented Neural Ranker , 2023, ArXiv.

[42]  Ledell Yu Wu,et al.  DynamicRetriever: A Pre-trained Model-based IR System Without an Explicit Index , 2023, Machine Intelligence Research.

[43]  Eric Nyberg,et al.  InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers , 2023, ArXiv.

[44]  Rodrigo Nogueira,et al.  InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval , 2023, ArXiv.

[45]  D. Roth,et al.  Rethinking with Retrieval: Faithful Large Language Model Inference , 2022, ArXiv.

[46]  Xiang Lisa Li,et al.  Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP , 2022, ArXiv.

[47]  Jimmy J. Lin,et al.  Precise Zero-Shot Dense Retrieval without Relevance Labels , 2022, ACL.

[48]  Ashish Sabharwal,et al.  Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions , 2022, ACL.

[49]  K. Chang,et al.  Towards Reasoning in Large Language Models: A Survey , 2022, ACL.

[50]  Meghana Moorthy Bhat,et al.  AugTriever: Unsupervised Dense Retrieval by Scalable Data Augmentation , 2022, 2212.08841.

[51]  Wayne Xin Zhao,et al.  Dense Text Retrieval Based on Pretrained Language Models: A Survey , 2022, ACM Trans. Inf. Syst..

[52]  Hannaneh Hajishirzi,et al.  Task-aware Retrieval with Instructions , 2022, ACL.

[53]  Christopher D. Manning,et al.  Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.

[54]  Alexander M. Rush,et al.  BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.

[55]  S. Verberne Pretrained Transformers for Text Ranking: BERT and Beyond , 2022, CL.

[56]  Michael Bendersky,et al.  QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation , 2022, EMNLP.

[57]  Xuanhui Wang,et al.  RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses , 2022, SIGIR.

[58]  Noah A. Smith,et al.  Measuring and Narrowing the Compositionality Gap in Language Models , 2022, ArXiv.

[59]  P. Zhang,et al.  GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.

[60]  Keith B. Hall,et al.  Promptagator: Few-shot Dense Retrieval From 8 Examples , 2022, ICLR.

[61]  Dan Iter,et al.  Generate rather than Retrieve: Large Language Models are Strong Context Generators , 2022, ICLR.

[62]  J. Guo,et al.  CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks , 2022, CIKM.

[63]  Jane A. Yu,et al.  Few-shot Learning with Retrieval Augmented Language Models , 2022, J. Mach. Learn. Res..

[64]  Tom B. Brown,et al.  Language Models (Mostly) Know What They Know , 2022, ArXiv.

[65]  Yuhuai Wu,et al.  Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.

[66]  Devendra Singh Sachan,et al.  Questions Are All You Need to Train a Dense Passage Retriever , 2022, TACL.

[67]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[68]  A. D. Vries,et al.  ORCAS-I: Queries Annotated with Intent using Weak Supervision , 2022, SIGIR.

[69]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[70]  Devendra Singh Sachan,et al.  Improving Passage Retrieval with Zero-Shot Question Generation , 2022, EMNLP.

[71]  Stella Rose Biderman,et al.  GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.

[72]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[73]  Lisa Anne Hendricks,et al.  Training Compute-Optimal Large Language Models , 2022, ArXiv.

[74]  Angeliki Lazaridou,et al.  Internet-augmented language models through few-shot prompting for open-domain question answering , 2022, ArXiv.

[75]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[76]  William W. Cohen,et al.  Transformer Memory as a Differentiable Search Index , 2022, NeurIPS.

[77]  Rodrigo Nogueira,et al.  InPars: Data Augmentation for Information Retrieval using Large Language Models , 2022, ArXiv.

[78]  Blake A. Hechtman,et al.  Unified Scaling Laws for Routed Language Models , 2022, ICML.

[79]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[80]  Peter Welinder,et al.  Text and Code Embeddings by Contrastive Pre-Training , 2022, ArXiv.

[81]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[82]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[83]  Keith B. Hall,et al.  Large Dual Encoders Are Generalizable Retrievers , 2021, EMNLP.

[84]  Quoc V. Le,et al.  GLaM: Efficient Scaling of Language Models with Mixture-of-Experts , 2021, ICML.

[85]  Diego de Las Casas,et al.  Improving language models by retrieving from trillions of tokens , 2021, ICML.

[86]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[87]  Zhicheng Dou,et al.  PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling , 2021, CIKM.

[88]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[89]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[90]  Pan Du,et al.  Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking , 2021, CIKM.

[91]  Zhicheng Dou,et al.  Learning Implicit User Profile for Personalized Retrieval-Based Chatbot , 2021, CIKM.

[92]  Iadh Ounis,et al.  IntenT5: Search Result Diversification using Causal Language Models , 2021, ArXiv.

[93]  Hiroaki Hayashi,et al.  Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[94]  Pan Du,et al.  Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals , 2021, SIGIR.

[95]  Ji-Rong Wen,et al.  Modeling Intent Graph for Search Result Diversification , 2021, SIGIR.

[96]  Hao Tian,et al.  ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation , 2021, ArXiv.

[97]  Yelong Shen,et al.  LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.

[98]  Ji-Rong Wen,et al.  Pretrained Language Models for Text Generation: A Survey , 2021, ArXiv.

[99]  Jheng-Hong Yang,et al.  Text-to-Text Multi-view Learning for Passage Re-ranking , 2021, SIGIR.

[100]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[101]  Iryna Gurevych,et al.  BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models , 2021, NeurIPS Datasets and Benchmarks.

[102]  Zhicheng Dou,et al.  Content Selection Network for Document-grounded Retrieval-based Chatbots , 2021, ECIR.

[103]  Jimmy J. Lin,et al.  The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models , 2021, ArXiv.

[104]  Noam M. Shazeer,et al.  Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..

[105]  Graham Neubig,et al.  How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering , 2020, Transactions of the Association for Computational Linguistics.

[106]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[107]  Hua Wu,et al.  RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering , 2020, NAACL.

[108]  Jimmy J. Lin,et al.  Pretrained Transformers for Text Ranking: BERT and Beyond , 2020, NAACL.

[109]  Ji-Rong Wen,et al.  DVGAN: A Minimax Game for Search Result Diversification Combining Explicit and Implicit Features , 2020, SIGIR.

[110]  Edouard Grave,et al.  Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.

[111]  Paul N. Bennett,et al.  Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , 2020, ICLR.

[112]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[113]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[114]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[115]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[116]  Jimmy J. Lin,et al.  Document Ranking with a Pretrained Sequence-to-Sequence Model , 2020, FINDINGS.

[117]  Jianfeng Gao,et al.  UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.

[118]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[119]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[120]  Chunyuan Yuan,et al.  Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots , 2019, EMNLP.

[121]  Jimmy J. Lin,et al.  Multi-Stage Document Ranking with BERT , 2019, ArXiv.

[122]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[123]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[124]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[125]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[126]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[127]  Ji-Rong Wen,et al.  Personalizing Search Results Using Hierarchical RNN with Query-aware Attention , 2018, CIKM.

[128]  Benno Stein,et al.  Retrieval of the Best Counterargument without Prior Topic Knowledge , 2018, ACL.

[129]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[130]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[131]  Harry Shum,et al.  From Eliza to XiaoIce: challenges and opportunities with social chatbots , 2018, Frontiers of Information Technology & Electronic Engineering.

[132]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[133]  Bhaskar Mitra,et al.  Neural Models for Information Retrieval , 2017, ArXiv.

[134]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[135]  Jianfeng Gao,et al.  MS MARCO: A Human Generated MAchine Reading COmprehension Dataset , 2016, CoCo@NIPS.

[136]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[137]  Wei Chu,et al.  Modeling the impact of short- and long-term behavior on search personalization , 2012, SIGIR '12.

[138]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[139]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[140]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[141]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[142]  W. Bruce Croft,et al.  Statistical language modeling for information retrieval , 2006, Annu. Rev. Inf. Sci. Technol..

[143]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[144]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[145]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[146]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[147]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[148]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[149]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[150]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[151]  Craig A. Knoblock,et al.  Query reformulation for dynamic information integration , 1996, Journal of Intelligent Information Systems.

[152]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[153]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[154]  R. Darnell Translation , 1873, The Indian medical gazette.

[155]  The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 , 2023, ICLR.

[156]  Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , 2023, ACL.

[157]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[158]  Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021 , 2021, IJCAI.

[159]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[160]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[161]  Ameya Pitale Examples , 2019, Siegel Modular Forms.

[162]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[163]  Aidong Zhang,et al.  A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[164]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[165]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[166]  Image retrieval: ideas, influences, and trends of the new age , 2008 .

[167]  Charles L. A. Clarke,et al.  SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007 , 2007, SIGIR.

[168]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[169]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[170]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .