TILDE: Term Independent Likelihood moDEl for Passage Re-ranking

Deep language models (deep LMs) are increasingly being used for full text retrieval or within cascade retrieval pipelines as later-stage re-rankers. A problem with using deep LMs is that, at query time, a slow inference step needs to be performed -- this hinders the practical adoption of these powerful retrieval models, or limits sensibly how many documents can be considered for re-ranking. We propose the novel, BERT-based, Term Independent Likelihood moDEl (TILDE), which ranks documents by both query and document likelihood. At query time, our model does not require the inference step of deep language models based retrieval approaches, thus providing consistent time-savings, as the prediction of query terms' likelihood can be pre-computed and stored during index creation. This is achieved by relaxing the term dependence assumption made by the deep LMs. In addition, we have devised a novel bi-directional training loss which allows TILDE to maximise both query and document likelihood at the same time during training. At query time, TILDE can rely on its query likelihood component (TILDE-QL) solely, or the combination of TILDE-QL and its document likelihood component (TILDE-DL), thus providing a flexible trade-off between efficiency and effectiveness. Exploiting both components provide the highest effectiveness at a higher computational cost while relying only on TILDE-QL trades off effectiveness for faster response time due to no inference being required. TILDE is evaluated on the MS MARCO and TREC Deep Learning 2019 and 2020 passage ranking datasets. Empirical results show that, compared to other approaches that aim to make deep language models viable operationally, TILDE achieves competitive effectiveness coupled with low query latency.

[1]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[2]  Justin Zobel,et al.  Corpus Bootstrapping for Assessment of the Properties of Effectiveness Measures , 2020, CIKM.

[3]  Jimmy J. Lin,et al.  DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference , 2020, ACL.

[4]  Ramesh Nallapati,et al.  Beyond [CLS] through Ranking by Generation , 2020, EMNLP.

[5]  M. Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Yiqun Liu,et al.  RepBERT: Contextualized Text Embeddings for First-Stage Retrieval , 2020, ArXiv.

[8]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[9]  Furu Wei,et al.  BERT Loses Patience: Fast and Robust Inference with Early Exit , 2020, NeurIPS.

[10]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Jimmy J. Lin,et al.  Document Ranking with a Pretrained Sequence-to-Sequence Model , 2020, FINDINGS.

[12]  Raffaele Perego,et al.  Expansion via Prediction of Importance with Contextualization , 2020, SIGIR.

[13]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..

[14]  Ye Li,et al.  Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , 2020, ArXiv.

[15]  Jimmy J. Lin,et al.  Pretrained Transformers for Text Ranking: BERT and Beyond , 2020, NAACL.

[16]  Norbert Fuhr,et al.  Some Common Mistakes In IR Evaluation, And How They Can Be Avoided , 2018, SIGIR Forum.

[17]  Qun Liu,et al.  DynaBERT: Dynamic BERT with Adaptive Width and Depth , 2020, NeurIPS.

[18]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[19]  Jimmy J. Lin,et al.  Anserini , 2018, Journal of Data and Information Quality.

[20]  Bhaskar Mitra,et al.  Overview of the TREC 2019 deep learning track , 2020, ArXiv.

[21]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[22]  Jamie Callan,et al.  Context-Aware Term Weighting For First Stage Passage Retrieval , 2020, SIGIR.

[23]  Bhaskar Mitra,et al.  An Introduction to Neural Information Retrieval , 2018, Found. Trends Inf. Retr..

[24]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[27]  Luyu Gao,et al.  Complementing Lexical Retrieval with Semantic Residual Embedding , 2020, ArXiv.

[28]  Charles L. A. Clarke,et al.  A Lightweight Environment for Learning Experimental IR Research Practices , 2020, SIGIR.

[29]  Sean MacAvaney,et al.  OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline , 2020, WSDM.

[30]  Tetsuya Sakai,et al.  On Fuhr's guideline for IR evaluation , 2020, SIGIR Forum.

[31]  Guido Zuccon,et al.  Deep Query Likelihood Model for Information Retrieval , 2021, ECIR.

[32]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.