Conformer-Kernel with Query Term Independence for Document Retrieval

The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark---and can be considered to be an efficient (but slightly less effective) alternative to BERT-based ranking models. In this work, we extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption. Furthermore, to reduce the memory complexity of the Transformer layers with respect to the input sequence length, we propose a new Conformer layer. We show that the Conformer's GPU memory requirement scales linearly with input sequence length, making it a more viable option when ranking long documents. Finally, we demonstrate that incorporating explicit term matching signal into the model can be particularly useful in the full retrieval setting. We present preliminary results from our work in this paper.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[4]  Matei Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[5]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[6]  W. Bruce Croft,et al.  From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing , 2018, CIKM.

[7]  Bhaskar Mitra,et al.  Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks , 2019, ArXiv.

[8]  Emine Yilmaz,et al.  On the Reliability of Test Collections for Evaluating Systems of Different Types , 2020, SIGIR.

[9]  Aurko Roy,et al.  Efficient Content-Based Sparse Attention with Routing Transformers , 2020, TACL.

[10]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[11]  Wei-Cheng Chang,et al.  Pre-training Tasks for Embedding-based Large-scale Retrieval , 2020, ICLR.

[12]  Ji Ma,et al.  Zero-shot Neural Retrieval via Domain-targeted Synthetic Query Generation , 2020, ArXiv.

[13]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[14]  D. Cheriton From doc2query to docTTTTTquery , 2019 .

[15]  Nick Craswell,et al.  Duet at Trec 2019 Deep Learning Track , 2019, TREC.

[16]  Edouard Grave,et al.  Adaptive Attention Span in Transformers , 2019, ACL.

[17]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[18]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[19]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[20]  Bhaskar Mitra,et al.  Benchmark for Complex Answer Retrieval , 2017, ICTIR.

[21]  Jimmy J. Lin,et al.  Document Expansion by Query Prediction , 2019, ArXiv.

[22]  Bhaskar Mitra,et al.  Optimizing Query Evaluations Using Reinforcement Learning for Web Search , 2018, SIGIR.

[23]  Bhaskar Mitra,et al.  An Introduction to Neural Information Retrieval , 2018, Found. Trends Inf. Retr..

[24]  Noah Constant,et al.  ReQA: An Evaluation for End-to-End Answer Retrieval Models , 2019, EMNLP.

[25]  Liu Yang,et al.  Sparse Sinkhorn Attention , 2020, ICML.

[26]  W. Bruce Croft,et al.  A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[27]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[28]  Zhuyun Dai,et al.  An Evaluation of Weakly-Supervised DeepCT in the TREC 2019 Deep Learning Track , 2019, TREC.

[29]  Jacob Eisenstein,et al.  Sparse, Dense, and Attentional Representations for Text Retrieval , 2020, Transactions of the Association for Computational Linguistics.

[30]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[31]  Raffaele Perego,et al.  Expansion via Prediction of Importance with Contextualization , 2020, SIGIR.

[32]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[33]  Allan Hanbury,et al.  Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking , 2020, ECAI.

[34]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[37]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[38]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[39]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[40]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[41]  Bhaskar Mitra,et al.  ORCAS: 20 Million Clicked Query-Document Pairs for Analyzing Search , 2020, CIKM.

[42]  Mohammad Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[43]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[44]  Bhaskar Mitra,et al.  An Axiomatic Approach to Regularizing Neural Ranking Models , 2019, SIGIR.

[45]  Bhaskar Mitra,et al.  Reply With: Proactive Recommendation of Email Attachments , 2017, CIKM.

[46]  Ye Li,et al.  Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , 2020, ArXiv.

[47]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[48]  Zhuyun Dai,et al.  Context-Aware Passage TermWeighting For First Stage Retrieval , 2020 .

[49]  Allan Hanbury,et al.  Local Self-Attention over Long Text for Efficient Document Retrieval , 2020, SIGIR.

[50]  Chenliang Li,et al.  IDST at TREC 2019 Deep Learning Track: Deep Cascade Ranking with Generation-based Document Expansion and Pre-trained Language Modeling , 2019, TREC.

[51]  Bhaskar Mitra,et al.  Overview of the TREC 2019 deep learning track , 2020, ArXiv.

[52]  Joel Mackenzie,et al.  Efficiency Implications of Term Weighting for Passage Retrieval , 2020, SIGIR.

[53]  Zhijian Liu,et al.  Lite Transformer with Long-Short Range Attention , 2020, ICLR.

[54]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[55]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[56]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[57]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[58]  J. Shane Culpepper,et al.  The Potential of Learned Index Structures for Index Compression , 2018, ADCS.

[59]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[60]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[61]  Bhaskar Mitra,et al.  An Updated Duet Model for Passage Re-ranking , 2019, ArXiv.

[62]  Bhaskar Mitra,et al.  A Dual Embedding Space Model for Document Ranking , 2016, ArXiv.

[63]  Kyunghyun Cho,et al.  Task-Oriented Query Reformulation with Reinforcement Learning , 2017, EMNLP.