CEDR: Contextualized Embeddings for Document Ranking

Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several ex-sting neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.

[1]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[2]  Bhaskar Mitra,et al.  Optimizing Query Evaluations Using Reinforcement Learning for Web Search , 2018, SIGIR.

[3]  Craig MacDonald,et al.  University of Glasgow at TREC 2009: Experiments with Terrier , 2009, TREC.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[7]  Oren Kurland,et al.  The Technion at TREC 2013 Web Track: Cluster-based Document Retrieval , 2013, TREC.

[8]  Zhiyuan Liu,et al.  Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search , 2018, WSDM.

[9]  Fernando Diaz,et al.  SIGIR 2018 Workshop on Learning from Limited or Noisy Data for Information Retrieval , 2018, SIGIR.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Kui-Lam Kwok,et al.  TREC 2004 Robust Track Experiments Using PIRCS , 2004, TREC.

[12]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[13]  Hui Fang,et al.  Entity Came to Rescue - Leveraging Entities to Minimize Risks in Web Search , 2014, TREC.

[14]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[15]  Craig MacDonald,et al.  University of Glasgow at TREC 2012: Experiments with Terrier in Medical Records, Microblog, and Web Tracks , 2012, TREC.

[16]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[17]  Gerard de Melo,et al.  Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval , 2017, WSDM.

[18]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[19]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[20]  Ion Androutsopoulos,et al.  Deep Relevance Ranking Using Enhanced Document-Query Interactions , 2018, EMNLP.

[21]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[22]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23]  Jimmy J. Lin,et al.  Anserini: Enabling the Use of Lucene for Information Retrieval Research , 2017, SIGIR.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.