Modularized Transfomer-based Ranking Framework

Recent innovations in Transformer-based ranking models have advanced the state-of-the-art in information retrieval. However, these Transformers are computationally expensive, and their opaque hidden states make it hard to understand the ranking process. In this work, we modularize the Transformer ranker into separate modules for text representation and interaction. We show how this design enables substantially faster ranking using offline pre-computed representations and light-weight online interactions. The modular design is also easier to interpret and sheds light on the ranking process in Transformer rankers.

[1]  Yiqun Liu,et al.  Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment , 2019, SIGIR.

[2]  Bhaskar Mitra,et al.  Overview of the TREC 2019 deep learning track , 2020, ArXiv.

[3]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[4]  J. Shane Culpepper,et al.  Statistical comparisons of non-deterministic IR systems using two dimensional variance , 2015, Inf. Process. Manag..

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[7]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[8]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[9]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[10]  Susan T. Dumais,et al.  Learned Vector-Space Models for Document Retrieval , 1995, Inf. Process. Manag..

[11]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[12]  Zhiyuan Liu,et al.  Understanding the Behaviors of BERT in Ranking , 2019, ArXiv.

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Jimmy J. Lin,et al.  Document Expansion by Query Prediction , 2019, ArXiv.

[15]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[16]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[17]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[18]  Zhuyun Dai,et al.  Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval , 2019, ArXiv.

[19]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[22]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[23]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[24]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[25]  Zhiyuan Liu,et al.  Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search , 2018, WSDM.

[26]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[27]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.