Zero-Shot Listwise Document Reranking with a Large Language Model

Supervised ranking methods based on bi-encoder or cross-encoder architectures have shown success in multi-stage text ranking tasks, but they require large amounts of relevance judgments as training data. In this work, we propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data. Different from the existing pointwise ranking methods, where documents are scored independently and ranked according to the scores, LRL directly generates a reordered list of document identifiers given the candidate documents. Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker to improve the top-ranked results of a pointwise method for improved efficiency. Additionally, we apply our approach to subsets of MIRACL, a recent multilingual retrieval dataset, with results showing its potential to generalize across different languages.

[1]  Zekai Chen,et al.  Language Models are Few-shot Learners for Prognostic Prediction , 2023, ArXiv.

[2]  Eric Nyberg,et al.  InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers , 2023, ArXiv.

[3]  Jimmy J. Lin,et al.  Precise Zero-Shot Dense Retrieval without Relevance Labels , 2022, ACL.

[4]  Nandan Thakur,et al.  Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages , 2022, ArXiv.

[5]  Xuanhui Wang,et al.  RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses , 2022, SIGIR.

[6]  Keith B. Hall,et al.  Promptagator: Few-shot Dense Retrieval From 8 Examples , 2022, ICLR.

[7]  Rodrigo Nogueira,et al.  InPars: Unsupervised Dataset Generation for Information Retrieval , 2022, SIGIR.

[8]  Devendra Singh Sachan,et al.  Improving Passage Retrieval with Zero-Shot Question Generation , 2022, EMNLP.

[9]  Jimmy J. Lin,et al.  Toward Best Practices for Training Multilingual Dense Retrieval Models , 2022, ACM Trans. Inf. Syst..

[10]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[11]  Edouard Grave,et al.  Unsupervised Dense Information Retrieval with Contrastive Learning , 2021, Trans. Mach. Learn. Res..

[12]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[13]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[14]  Rodrigo Nogueira,et al.  mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset , 2021, 2108.13897.

[15]  Iryna Gurevych,et al.  BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models , 2021, NeurIPS Datasets and Benchmarks.

[16]  Luyu Gao,et al.  Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline , 2021, ECIR.

[17]  Nicola De Cao,et al.  KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.

[18]  Jimmy J. Lin,et al.  In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval , 2021, REPL4NLP.

[19]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[20]  Bhaskar Mitra,et al.  Overview of the TREC 2019 deep learning track , 2020, ArXiv.

[21]  Jimmy J. Lin,et al.  Document Ranking with a Pretrained Sequence-to-Sequence Model , 2020, FINDINGS.

[22]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[23]  Nick Craswell,et al.  O VERVIEW OF THE TREC 2019 DEEP LEARNING TRACK , 2020 .

[24]  Jimmy J. Lin,et al.  Multi-Stage Document Ranking with BERT , 2019, ArXiv.

[25]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[26]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[27]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..