Transformer-Based Open Domain Biomedical Question Answering at BioASQ8 Challenge

BioASQ task B focuses on biomedical information retrieval and question answering. This paper describes the participation and proposed solutions of our team. We build a system based on recent advances in the general domain as well as the approaches from previous years of the competition. We adapt a system based on a pretrained BERT for document and snippet retrieval, question answering and summarization. We describe all approaches we experimented with and show that while neural approaches do well, sometimes baseline approaches have high automatic metrics. The proposed system achieves competitive performance while being general so that it can be applied to other domains as well.

[1]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[2]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[3]  Jaewoo Kang,et al.  Pre-trained Language Model for Biomedical Question Answering , 2019, PKDD/ECML Workshops.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[6]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Dimitris Pappas,et al.  AUEB at BioASQ 6: Document and Snippet Retrieval , 2018, ArXiv.

[9]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[10]  M VoorheesEllen The TREC question answering track , 2001 .

[11]  William W. Cohen,et al.  PubMedQA: A Dataset for Biomedical Research Question Answering , 2019, EMNLP.

[12]  Jimmy J. Lin,et al.  Anserini , 2018, Journal of Data and Information Quality.

[13]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[14]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[15]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[16]  Manuel Montes-y-Gómez,et al.  A Mixed Information Source Approach for Biomedical Question Answering: MindLab at BioASQ 7B , 2019, PKDD/ECML Workshops.

[17]  Diego Mollá Aliod,et al.  Classification Betters Regression in Query-based Multi-document Summarisation Techniques for Question Answering: Macquarie University at BioASQ7b , 2019, Machine Learning and Knowledge Discovery in Databases.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[20]  Martin Krallinger,et al.  BioASQ at CLEF2020: Large-Scale Biomedical Semantic Indexing and Question Answering , 2020, ECIR.

[21]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[22]  Grigorios Tsoumakas,et al.  Structured Summarization of Academic Publications , 2019, PKDD/ECML Workshops.

[23]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[24]  Xu-Cheng Yin,et al.  A Multi-strategy Query Processing Approach for Biomedical Question Answering: USTB_PRIR at BioASQ 2017 Task 5B , 2017, BioNLP.

[25]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[26]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[27]  Mariana L. Neves,et al.  Neural Question Answering at BioASQ 5B , 2017, BioNLP.

[28]  Ion Androutsopoulos,et al.  Deep Relevance Ranking Using Enhanced Document-Query Interactions , 2018, EMNLP.

[29]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[30]  Kathi Canese,et al.  PubMed: The Bibliographic Database , 2013 .

[31]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[32]  Dimitris Pappas,et al.  AUEB at BioASQ 7: Document and Snippet Retrieval , 2019, PKDD/ECML Workshops.

[33]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[34]  Ryan T. McDonald,et al.  Measuring Domain Portability and ErrorPropagation in Biomedical QA , 2019, PKDD/ECML Workshops.