Query Embedding Pruning for Dense Retrieval

Recent advances in dense retrieval techniques have offered the promise of being able not just to re-rank documents using contextualised language models such as BERT, but also to use such models to identify documents from the collection in the first place. However, when using dense retrieval approaches that use multiple embedded representations for each query, a large number of documents can be retrieved for each query, hindering the efficiency of the method. Hence, this work is the first to consider efficiency improvements in the context of a dense retrieval approach (namely ColBERT), by pruning query term embeddings that are estimated not to be useful for retrieving relevant documents. Our proposed query embeddings pruning reduces the cost of the dense retrieval operation, as well as reducing the number of documents that are retrieved and hence require to be fully scored. Experiments conducted on the MSMARCO passage ranking corpus demonstrate that, when reducing the number of query embeddings used from 32 to 3 based on the collection frequency of the corresponding tokens, query embedding pruning results in no statistically significant differences in effectiveness, while reducing the number of documents retrieved by 70%. In terms of mean response time for the end-to-end to end system, this results in a 2.65x speedup.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Benjamin Piwowarski,et al.  A White Box Analysis of ColBERT , 2020, ECIR.

[3]  Craig MacDonald,et al.  Efficient and effective retrieval using selective pruning , 2013, WSDM.

[4]  Iadh Ounis,et al.  Efficient Query Processing for Scalable Web Search , 2018, Found. Trends Inf. Retr..

[5]  Ye Li,et al.  Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , 2020, ArXiv.

[6]  W. Bruce Croft,et al.  From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing , 2018, CIKM.

[7]  Jason Weston,et al.  Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2019 .

[8]  Ronald Fagin,et al.  Static index pruning for information retrieval systems , 2001, SIGIR '01.

[9]  Craig MacDonald,et al.  PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval , 2021, CIKM.

[10]  Jacob Eisenstein,et al.  Sparse, Dense, and Attentional Representations for Text Retrieval , 2021, Transactions of the Association for Computational Linguistics.

[11]  Christopher J. C. Burges,et al.  High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[12]  Craig Macdonald,et al.  Declarative Experimentation in Information Retrieval using PyTerrier , 2020, ICTIR.

[13]  M. Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[14]  Raffaele Perego,et al.  Expansion via Prediction of Importance with Contextualization , 2020, SIGIR.

[15]  M. Zaharia,et al.  ColBERT , 2020, Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.

[16]  Raffaele Perego,et al.  Efficient Document Re-Ranking for Transformers by Precomputing Term Representations , 2020, SIGIR.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Jimmy J. Lin,et al.  Pretrained Transformers for Text Ranking: BERT and Beyond , 2020, NAACL.