The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the self-attention mechanism, however, such models have difficulties processing long documents. Recent works dealing with this issue include truncating long documents, segmenting them into passages that can be treated by a standard BERT model, or modifying the selfattention mechanism to make it sparser as in sparse-attention models. However, these approaches either lose information or have high computational complexity (and are both time, memory and energy consuming in this later case). We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.

[1]  Jun Xu,et al.  Modeling Diverse Relevance Patterns in Ad-hoc Retrieval , 2018, SIGIR.

[2]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[3]  Chen Zhang,et al.  Balanced Sparsity for Efficient DNN Inference on GPU , 2018, AAAI.

[4]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[5]  Diederik P. Kingma,et al.  GPU Kernels for Block-Sparse Weights , 2017 .

[6]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[7]  Gerard de Melo,et al.  Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval , 2017, WSDM.

[8]  Md. Mustafizur Rahman,et al.  Neural information retrieval: at the end of the early years , 2017, Information Retrieval Journal.

[9]  Hinrich Schütze,et al.  Scoring , term weighting and thevector space model , 2015 .

[10]  Allan Hanbury,et al.  Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking , 2021, SIGIR.

[11]  Minghan Li,et al.  KeyBLD: Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval , 2021, SIGIR.

[12]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[14]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[17]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[18]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[19]  Andreas Vlachos,et al.  The Fact Extraction and VERification (FEVER) Shared Task , 2018, FEVER@EMNLP.

[20]  Xueqi Cheng,et al.  DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[21]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[24]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[25]  Haoxiang Lin,et al.  Estimating GPU memory consumption of deep learning models , 2020, ESEC/SIGSOFT FSE.

[26]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[27]  Allan Hanbury,et al.  Local Self-Attention over Long Text for Efficient Document Retrieval , 2020, SIGIR.

[28]  Yingfei Sun,et al.  PARADE: Passage Representation Aggregation forDocument Reranking , 2020, ACM Trans. Inf. Syst..

[29]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[30]  Paul N. Bennett,et al.  Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention , 2020, ICLR.

[31]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[32]  Kam-Fai Wong,et al.  A retrospective study of a hybrid document-context based retrieval model , 2007, Inf. Process. Manag..

[33]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[34]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35]  Allan Hanbury,et al.  Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking , 2020, ECAI.

[36]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[37]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[38]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[39]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[40]  W. Bruce Croft,et al.  A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[41]  Nazli Goharian,et al.  CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[42]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[44]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[45]  Li Yang,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[46]  Li Yang,et al.  ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.

[47]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[48]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[49]  Wei Wang,et al.  Long Document Ranking with Query-Directed Sparse Transformer , 2020, FINDINGS.

[50]  Jimmy J. Lin,et al.  Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants , 2020, ECIR.

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[53]  Quentin Grail,et al.  Globalizing BERT-based Transformer Architectures for Long Document Summarization , 2021, EACL.

[54]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[55]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[56]  M. de Rijke,et al.  Pytrec_eval: An Extremely Fast Python Interface to trec_eval , 2018, SIGIR.

[57]  Bhaskar Mitra,et al.  Overview of the TREC 2019 deep learning track , 2020, ArXiv.

[58]  Chang Zhou,et al.  CogLTX: Applying BERT to Long Texts , 2020, NeurIPS.

[59]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[60]  Ben He,et al.  NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval , 2018, EMNLP.

[61]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[62]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.