Empowering Language Model with Guided Knowledge Fusion for Biomedical Document Re-ranking

Pre-trained language models (PLMs) have proven to be effective for document re-ranking task. However, they lack the ability to fully interpret the semantics of biomedical and health-care queries and often rely on simplistic patterns for retrieving documents. To address this challenge, we propose an approach that integrates knowledge and the PLMs to guide the model toward effectively capturing information from external sources and retrieving the correct documents. We performed comprehensive experiments on two biomedical and open-domain datasets that show that our approach significantly improves vanilla PLMs and other existing approaches for document re-ranking task.

[1]  Yi Tay,et al.  ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference , 2022, FINDINGS.

[2]  Devendra Singh Sachan,et al.  Improving Passage Retrieval with Zero-Shot Question Generation , 2022, EMNLP.

[3]  Chitta Baral,et al.  Improving Biomedical Information Retrieval with Neural Retrievers , 2022, AAAI.

[4]  Christopher D. Manning,et al.  GreaseLM: Graph REASoning Enhanced Language Models , 2022, ICLR.

[5]  Keith B. Hall,et al.  Zero-shot Hybrid Retrieval and Reranking Models for Biomedical Literature , 2022, CLEF.

[6]  Jheng-Hong Yang,et al.  Text-to-Text Multi-view Learning for Passage Re-ranking , 2021, SIGIR.

[7]  Zheng Ye,et al.  Co-BERT: A Context-Aware BERT Retrieval Model Incorporating Local and Query-specific Context , 2021, ArXiv.

[8]  Iryna Gurevych,et al.  BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models , 2021, NeurIPS Datasets and Benchmarks.

[9]  Jimmy J. Lin,et al.  Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling , 2021, SIGIR.

[10]  J. Leskovec,et al.  QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering , 2021, NAACL.

[11]  Zaiqiao Meng,et al.  Self-Alignment Pretraining for Biomedical Entity Representations , 2020, NAACL.

[12]  Tiancheng Zhao,et al.  SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval , 2020, NAACL.

[13]  Paul N. Bennett,et al.  Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , 2020, ICLR.

[14]  Ryan T. McDonald,et al.  Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation , 2021, EACL.

[15]  Martin Krallinger,et al.  Overview of BioASQ 2020: The Eighth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering , 2020, CLEF.

[16]  Jamie Callan,et al.  Context-Aware Term Weighting For First Stage Passage Retrieval , 2020, SIGIR.

[17]  Kirk Roberts,et al.  TREC-COVID , 2020, SIGIR Forum.

[18]  Lingfei Wu,et al.  Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward , 2020, ACL.

[19]  Jun Yan,et al.  Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering , 2020, EMNLP.

[20]  M. Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.

[21]  Shuguang Han,et al.  Learning-to-Rank with BERT in TF-Ranking , 2020, ArXiv.

[22]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[23]  Jimmy J. Lin,et al.  Document Ranking with a Pretrained Sequence-to-Sequence Model , 2020, FINDINGS.

[24]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[25]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[26]  Vladislav Belyaev,et al.  Transformer-Based Open Domain Biomedical Question Answering at BioASQ8 Challenge , 2020, CLEF.

[27]  Dimitris Pappas,et al.  AUEB-NLP at BioASQ 8: Biomedical Document and Snippet Retrieval , 2020, CLEF.

[28]  Xiang Ren,et al.  KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning , 2019, EMNLP.

[29]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[32]  Dimitris Pappas,et al.  AUEB at BioASQ 6: Document and Snippet Retrieval , 2018, ArXiv.

[33]  M. de Rijke,et al.  Pytrec_eval: An Extremely Fast Python Interface to trec_eval , 2018, SIGIR.

[34]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[35]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[36]  Ming-Wei Chang,et al.  A Knowledge-Grounded Neural Conversation Model , 2017, AAAI.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[39]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Dirk P. Kroese,et al.  Monte Carlo Sampling , 2014 .

[42]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[43]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[44]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.