Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval

We present Capreolus, a toolkit designed to facilitate end-to-end it ad hoc retrieval experiments with neural networks by providing implementations of prominent neural ranking models within a common framework. Our toolkit adopts a standard reranking architecture via tight integration with the Anserini toolkit for candidate document generation using standard bag-of-words approaches. Using Capreolus, we are able to reproduce Yang et al.'s recent SIGIR 2019 finding that, in a reranking scenario on the test collection from the TREC 2004 Robust Track, many neural retrieval models do not significantly outperform a strong query expansion baseline. Furthermore, we find that this holds true for five additional models implemented in Capreolus. We describe the architecture and design of our toolkit, which includes a Web interface to facilitate comparisons between rankings returned by different models.

[1]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[2]  Nazli Goharian,et al.  CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[3]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[4]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[5]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[6]  Grace Hui Yang,et al.  DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval , 2019, AAAI.

[7]  Jimmy J. Lin,et al.  Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models , 2019, SIGIR.

[8]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[9]  Ion Androutsopoulos,et al.  Deep Relevance Ranking Using Enhanced Document-Query Interactions , 2018, EMNLP.

[10]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[11]  Jimmy J. Lin,et al.  Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval , 2019, EMNLP.

[12]  Jimmy J. Lin,et al.  Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.

[13]  Jimmy J. Lin,et al.  Anserini , 2018, Journal of Data and Information Quality.

[14]  Zhiyuan Liu,et al.  Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search , 2018, WSDM.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[17]  Jun Xu,et al.  Modeling Diverse Relevance Patterns in Ad-hoc Retrieval , 2018, SIGIR.

[18]  Craig MacDonald,et al.  From Puppy to Maturity: Experiences in Developing Terrier , 2012, OSIR@SIGIR.

[19]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[20]  Xiang Ji,et al.  MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching , 2019, SIGIR.