Let's measure run time! Extending the IR replicability infrastructure to include performance aspects

Establishing a docker-based replicability infrastructure offers the community a great opportunity: measuring the run time of information retrieval systems. The time required to present query results to a user is paramount to the users satisfaction. Recent advances in neural IR re-ranking models put the issue of query latency at the forefront. They bring a complex trade-off between performance and effectiveness based on a myriad of factors: the choice of encoding model, network architecture, hardware acceleration and many others. The best performing models (currently using the BERT transformer model) run orders of magnitude more slowly than simpler architectures. We aim to broaden the focus of the neural IR community to include performance considerations -- to sustain the practical applicability of our innovations. In this position paper we supply our argument with a case study exploring the performance of different neural re-ranking models. Finally, we propose to extend the OSIRRC docker-based replicability infrastructure with two performance focused benchmark scenarios.

[1]  Zhiyuan Liu,et al.  Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search , 2018, WSDM.

[2]  Allan Hanbury,et al.  Enriching Word Embeddings for Patent Retrieval with Global Context , 2019, ECIR.

[3]  Bhaskar Mitra,et al.  An Updated Duet Model for Passage Re-ranking , 2019, ArXiv.

[4]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Allan Hanbury,et al.  On the Effect of Low-Frequency Terms on Neural-IR Models , 2019, SIGIR.

[6]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[7]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[8]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[11]  Nazli Goharian,et al.  CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[12]  Jimmy J. Lin,et al.  Document Expansion by Query Prediction , 2019, ArXiv.

[13]  Ryen W. White,et al.  Slow Search: Information Retrieval without Time Constraints , 2013, HCIR '13.

[14]  Jimmy J. Lin,et al.  Anserini: Enabling the Use of Lucene for Information Retrieval Research , 2017, SIGIR.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[18]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[19]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[20]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[21]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[22]  Tobias Gass,et al.  Cloud-Based Evaluation of Anatomical Structure Segmentation and Landmark Detection Algorithms: VISCERAL Anatomy Benchmarks , 2016, IEEE Transactions on Medical Imaging.

[23]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.