论文信息 - Co-BERT: A Context-Aware BERT Retrieval Model Incorporating Local and Query-specific Context - 字舞流文

Co-BERT: A Context-Aware BERT Retrieval Model Incorporating Local and Query-specific Context

BERT-based text ranking models have dramatically advanced the state-of-the-art in ad-hoc retrieval, wherein most models tend to consider individual query-document pairs independently. In the meantime, the importance and usefulness to consider the crossdocuments interactions and the query-specific characteristics in a ranking model have been repeatedly confirmed, mostly in the context of learning to rank. The BERT-based ranking model, however, has not been able to fully incorporate these two types of ranking context, thereby ignoring the inter-document relationships from the ranking and the differences among queries. To mitigate this gap, in this work, an end-to-end transformer-based ranking model, named Co-BERT, has been proposed to exploit several BERT architectures to calibrate the query-document representations using pseudo relevance feedback before modeling the relevance of a group of documents jointly. Extensive experiments on two standard test collections confirm the effectiveness of the proposed model in improving the performance of text re-ranking over strong fine-tuned BERT-Base baselines. We plan to make our implementation open source to enable further comparisons.

Zheng Ye | Kai Hui | Ben He | Xiaoyang Chen | Xianpei Han | Le Sun | Le Sun | Kai Hui | Ben He | Zheng Ye | Xiaoyang Chen | Xianpei Han

[1] Gerard Salton,et al. The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[2] Charles L. A. Clarke,et al. Overview of the TREC 2004 Terabyte Track , 2004, TREC.

[3] Zhuyun Dai,et al. Rethinking Query Expansion for BERT Reranking , 2020, ECIR.

[4] Marc Najork,et al. Self-Attentive Document Interaction Networks for Permutation Equivariant Ranking , 2019, ArXiv.

[5] Jun Xu,et al. SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval , 2020, SIGIR.

[6] Craig MacDonald,et al. From Puppy to Maturity: Experiences in Developing Terrier , 2012, OSIR@SIGIR.

[7] Peng Zhou,et al. FastBERT: a Self-distilling BERT with Adaptive Inference Time , 2020, ACL.

[8] Xianpei Han,et al. BERT-QE: Contextualized Query Expansion for Document Re-ranking , 2020, FINDINGS.

[9] Tie-Yan Liu,et al. Introduction to special issue on learning to rank for information retrieval , 2010, Information Retrieval.

[10] Yiqun Liu,et al. Leveraging Passage-level Cumulative Gain for Document Ranking , 2020, WWW.

[11] Hamed Zamani,et al. Conformer-Kernel with Query Term Independence for Document Retrieval , 2020, ArXiv.

[12] W. Bruce Croft,et al. Relevance-Based Language Models , 2001, SIGIR '01.

[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[14] S. Robertson. The probability ranking principle in IR , 1997 .

[15] Filip Radlinski,et al. TREC Complex Answer Retrieval Overview , 2018, TREC.

[16] Gianni Amati,et al. Probability models for information retrieval based on divergence from randomness , 2003 .

[17] W. Bruce Croft,et al. A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19] Kyunghyun Cho,et al. Passage Re-ranking with BERT , 2019, ArXiv.

[20] Jamie Callan,et al. Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[21] M. Zaharia,et al. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[22] Jimmy J. Lin,et al. Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval , 2019, EMNLP.

[23] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.

[24] Ellen M. Voorhees,et al. Overview of the TREC 2004 Robust Track. , 2004 .

[25] W. Bruce Croft,et al. Learning a Deep Listwise Context Model for Ranking Refinement , 2018, SIGIR.

[26] Zhiyuan Liu,et al. Understanding the Behaviors of BERT in Ranking , 2019, ArXiv.

[27] Jianfeng Gao,et al. A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[28] Sebastian Bruch,et al. Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks , 2018, ICTIR.

[29] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.

[30] Jimmy J. Lin,et al. Multi-Stage Document Ranking with BERT , 2019, ArXiv.

[31] Zhuyun Dai,et al. PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer , 2021, ECIR.

[32] Hang Li,et al. A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[33] Yingfei Sun,et al. PARADE: Passage Representation Aggregation forDocument Reranking , 2020, ACM Trans. Inf. Syst..

[34] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35] J. J. Rocchio,et al. Relevance feedback in information retrieval , 1971 .

[36] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[37] Tao Qin,et al. LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[38] Zhuyun Dai,et al. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval , 2019, ArXiv.

[39] Giorgio Gambosi,et al. FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track , 2008, TREC.

[40] Dirk Krechel,et al. CoRT: Complementary Rankings from Transformers , 2021, NAACL.

[41] Nazli Goharian,et al. CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[42] Craig MacDonald,et al. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge , 2016, ECIR.