论文信息 - RepBERT: Contextualized Text Embeddings for First-Stage Retrieval - 字舞流文

RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, to represent documents and queries with fixed-length contextualized embeddings. The inner products of query and document embeddings are regarded as relevance scores. On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency is comparable to bag-of-words methods.

Yiqun Liu | Min Zhang | Jiaxin Mao | Shaoping Ma | Jingtao Zhan

[1] Jimmy J. Lin,et al. Document Expansion by Query Prediction , 2019, ArXiv.

[2] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[3] Ping Li,et al. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[4] Jason Baldridge,et al. Learning Dense Representations for Entity Retrieval , 2019, CoNLL.

[5] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[7] Jianfeng Gao,et al. A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[8] Jimmy J. Lin,et al. Anserini , 2018, Journal of Data and Information Quality.

[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[10] Hang Li,et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[11] Kyunghyun Cho,et al. Passage Re-ranking with BERT , 2019, ArXiv.

[12] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[13] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[14] Parikshit Ram,et al. Maximum inner-product search using cone trees , 2012, KDD.

[15] Wei Liu,et al. Learning Binary Codes for Maximum Inner Product Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[18] Zhuyun Dai,et al. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval , 2019, ArXiv.

[19] W. Bruce Croft,et al. A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[20] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[21] Chaitanya Sai Alaparthi,et al. Microsoft AI Challenge India 2018: Learning to Rank Passages for Web Question Answering with Deep Attention Networks , 2019, ArXiv.