KeyBLD: Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

Transformer-based models, and especially pre-trained language models like BERT, have shown great success on a variety of Natural Language Processing and Information Retrieval tasks. However, such models have difficulties to process long documents due to the quadratic complexity of the self-attention mechanism. Recent works either truncate long documents or segment them into passages that can be treated by a standard BERT model. A hierarchical architecture, such as a transformer, can be further adopted to build a document-level representation on top of the representations of each passage. However, these approaches either lose information or have high computational complexity (and are both time and energy consuming in this latter case). We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then aggregates few blocks to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.

[1]  Jun Xu,et al.  Modeling Diverse Relevance Patterns in Ad-hoc Retrieval , 2018, SIGIR.

[2]  Jimmy J. Lin,et al.  Anserini , 2018, Journal of Data and Information Quality.

[3]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[4]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[5]  Xueqi Cheng,et al.  DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[6]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[7]  Nazli Goharian,et al.  CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[8]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[9]  Li Yang,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[10]  Li Yang,et al.  ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.

[11]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[12]  Yingfei Sun,et al.  PARADE: Passage Representation Aggregation forDocument Reranking , 2020, ACM Trans. Inf. Syst..

[13]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[14]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[15]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[16]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Hinrich Schütze,et al.  Scoring , term weighting and thevector space model , 2015 .

[21]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[22]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[23]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[24]  Kam-Fai Wong,et al.  A retrospective study of a hybrid document-context based retrieval model , 2007, Inf. Process. Manag..

[25]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[26]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[27]  Wei Wang,et al.  Long Document Ranking with Query-Directed Sparse Transformer , 2020, FINDINGS.

[28]  Jimmy J. Lin,et al.  Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants , 2020, ECIR.

[29]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Chang Zhou,et al.  CogLTX: Applying BERT to Long Texts , 2020, NeurIPS.

[31]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Ben He,et al.  NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval , 2018, EMNLP.