论文信息 - The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval - 字舞流文

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the self-attention mechanism, however, such models have difficulties processing long documents. Recent works dealing with this issue include truncating long documents, segmenting them into passages that can be treated by a standard BERT model, or modifying the selfattention mechanism to make it sparser as in sparse-attention models. However, these approaches either lose information or have high computational complexity (and are both time, memory and energy consuming in this later case). We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.

Eric Gaussier | Yagmur Gizem Cinar | Diana Nicoleta Popa | Johan Chagnon | Minghan Li

[1] Jun Xu,et al. Modeling Diverse Relevance Patterns in Ad-hoc Retrieval , 2018, SIGIR.

[2] Xueqi Cheng,et al. Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[3] Chen Zhang,et al. Balanced Sparsity for Efficient DNN Inference on GPU , 2018, AAAI.

[4] Gerard de Melo,et al. PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[5] Diederik P. Kingma,et al. GPU Kernels for Block-Sparse Weights , 2017 .

[6] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.

[7] Gerard de Melo,et al. Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval , 2017, WSDM.

[8] Md. Mustafizur Rahman,et al. Neural information retrieval: at the end of the early years , 2017, Information Retrieval Journal.

[9] Hinrich Schütze,et al. Scoring , term weighting and thevector space model , 2015 .

[10] Allan Hanbury,et al. Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking , 2021, SIGIR.

[11] Minghan Li,et al. KeyBLD: Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval , 2021, SIGIR.

[12] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13] Hang Li. Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[14] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[15] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16] W. Bruce Croft,et al. A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[17] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.

[18] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.

[19] Andreas Vlachos,et al. The Fact Extraction and VERification (FEVER) Shared Task , 2018, FEVER@EMNLP.

[20] Xueqi Cheng,et al. DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[21] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[22] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23] Stephen E. Robertson,et al. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[24] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[25] Haoxiang Lin,et al. Estimating GPU memory consumption of deep learning models , 2020, ESEC/SIGSOFT FSE.

[26] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[27] Allan Hanbury,et al. Local Self-Attention over Long Text for Efficient Document Retrieval , 2020, SIGIR.

[28] Yingfei Sun,et al. PARADE: Passage Representation Aggregation forDocument Reranking , 2020, ACM Trans. Inf. Syst..

[29] Yelong Shen,et al. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[30] Paul N. Bennett,et al. Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention , 2020, ICLR.

[31] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[32] Kam-Fai Wong,et al. A retrospective study of a hybrid document-context based retrieval model , 2007, Inf. Process. Manag..

[33] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[34] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35] Allan Hanbury,et al. Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking , 2020, ECAI.

[36] Robert Krovetz,et al. Viewing morphology as an inference process , 1993, Artif. Intell..

[37] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[38] Zhiyuan Liu,et al. End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[39] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[40] W. Bruce Croft,et al. A Deep Look into Neural Ranking Models for Information Retrieval , 2019, Inf. Process. Manag..

[41] Nazli Goharian,et al. CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[42] Rabab Kreidieh Ward,et al. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43] Tao Qin,et al. Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[44] Jamie Callan,et al. Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[45] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[46] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.

[47] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[48] Jianfeng Gao,et al. A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[49] Wei Wang,et al. Long Document Ranking with Query-Directed Sparse Transformer , 2020, FINDINGS.

[50] Jimmy J. Lin,et al. Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants , 2020, ECIR.

[51] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.

[53] Quentin Grail,et al. Globalizing BERT-based Transformer Architectures for Long Document Summarization , 2021, EACL.

[54] Xueqi Cheng,et al. Text Matching as Image Recognition , 2016, AAAI.

[55] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[56] M. de Rijke,et al. Pytrec_eval: An Extremely Fast Python Interface to trec_eval , 2018, SIGIR.

[57] Bhaskar Mitra,et al. Overview of the TREC 2019 deep learning track , 2020, ArXiv.

[58] Chang Zhou,et al. CogLTX: Applying BERT to Long Texts , 2020, NeurIPS.

[59] Kyunghyun Cho,et al. Passage Re-ranking with BERT , 2019, ArXiv.

[60] Ben He,et al. NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval , 2018, EMNLP.

[61] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[62] Hang Li,et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.