Pre-trained Language Model for Biomedical Question Answering

The recent success of question answering systems is largely attributed to pre-trained language models. However, as language models are mostly pre-trained on general domain corpora such as Wikipedia, they often have difficulty in understanding biomedical questions. In this paper, we investigate the performance of BioBERT, a pre-trained biomedical language model, in answering biomedical questions including factoid, list, and yes/no type questions. BioBERT uses almost the same structure across various question types and achieved the best performance in the 7th BioASQ Challenge (Task 7b, Phase B). BioBERT pre-trained on SQuAD or SQuAD 2.0 easily outperformed previous state-of-the-art models. BioBERT obtains the best performance when it uses the appropriate pre-/post-processing strategies for questions, passages, and answers.

[1]  Kyle Lo,et al.  SciBERT: Pretrained Contextualized Embeddings for Scientific Text , 2019, ArXiv.

[2]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[3]  Jonathan Berant,et al.  MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[4]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[5]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[6]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[7]  Yonghwa Choi,et al.  A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining , 2019, IEEE Access.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Francisco M. Couto,et al.  Using Neural Networks for Relation Extraction from Biomedical Literature , 2019, Artificial Neural Networks, 3rd Edition.

[10]  Eric Nyberg,et al.  Learning to Answer Biomedical Questions: OAQA at BioASQ 4B , 2016 .

[11]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[12]  Ioannis A. Kakadiaris,et al.  Results of the sixth edition of the BioASQ Challenge , 2018 .

[13]  Ioannis A. Kakadiaris,et al.  Results of the 4th edition of BioASQ Challenge , 2016 .

[14]  Mariana L. Neves,et al.  Neural Question Answering at BioASQ 5B , 2017, BioNLP.

[15]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[18]  Jaewoo Kang,et al.  Chemical–gene relation extraction using recursive neural network , 2018, Database J. Biol. Databases Curation.

[19]  Xiaolin Yang,et al.  The cell line ontology-based representation, integration and analysis of cell lines used in China , 2019, BMC Bioinformatics.

[20]  Grigorios Tsoumakas,et al.  Word embeddings and external resources for answer processing in biomedical factoid question answering , 2019, J. Biomed. Informatics.

[21]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[22]  Mariana L. Neves,et al.  Neural Domain Adaptation for Biomedical Question Answering , 2017, CoNLL.

[23]  Yanchun Zhang,et al.  The Fudan Participation in the 2015 BioASQ Challenge: Large-scale Biomedical Semantic Indexing and Question Answering , 2015, CLEF.

[24]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[25]  Jaewoo Kang,et al.  CollaboNet: collaboration of deep neural networks for biomedical named entity recognition , 2018, BMC Bioinformatics.

[26]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[27]  Ioannis Ch. Paschalidis,et al.  Clinical Concept Extraction with Contextual Word Embedding , 2018, NIPS 2018.

[28]  Fabio A. González,et al.  MindLab Neural Network Approach at BioASQ 6B , 2018 .