BioAMA: Towards an End to End BioMedical Question Answering System

In this paper, we present a novel Biomedical Question Answering system, BioAMA: “Biomedical Ask Me Anything” on task 5b of the annual BioASQ challenge (Balikas et al., 2015). We focus on a wide variety of question types including factoid, list based, summary and yes/no type questions that generate both exact and wellformed ‘ideal’ answers. For summarytype questions, we combine effective IRbased techniques for retrieval and diversification of relevant snippets for a question to create an end-to-end system which achieves a ROUGE-2 score of 0.72 and a ROUGE-SU4 score of 0.71 on ideal answer questions (7% improvement over the previous best model). Additionally, we propose a novel Natural Language Inference (NLI) based framework to answer the yes/no questions. To train the NLI model, we also devise a transfer-learning technique by cross-domain projection of word embeddings. Finally, we present a two-stage approach to address the factoid and list type questions by first generating a candidate set using NER taggers and ranking them using both supervised and unsupervised techniques.

[1]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[2]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[3]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[4]  Dirk Weissenborn,et al.  FastQA: A Simple and Efficient Neural Architecture for Question Answering , 2017, ArXiv.

[5]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[8]  Eric Nyberg,et al.  Tackling Biomedical Text Summarization: OAQA at BioASQ 5B , 2017, BioNLP.

[9]  Harris Wu,et al.  Evaluating Web-based Question Answering Systems , 2002, LREC.

[10]  B. Carpenter,et al.  LingPipe for 99.99% Recall of Gene Mentions , 2007 .

[11]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[12]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[13]  Said Ouatik El Alaoui,et al.  A Biomedical Question Answering System in BioASQ 2017 , 2017, BioNLP.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Thomas Roelleke,et al.  Less Is More: Maximal Marginal Relevance as a Summarisation Feature , 2009, ICTIR.

[16]  Xiaolin Li,et al.  GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text , 2017, Bioinform..

[17]  Mariana L. Neves,et al.  Neural Question Answering at BioASQ 5B , 2017, BioNLP.

[18]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[19]  Yanchun Zhang,et al.  The Fudan Participation in the 2015 BioASQ Challenge: Large-scale Biomedical Semantic Indexing and Question Answering , 2015, CLEF.

[20]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[21]  Axel-Cyrille Ngonga Ngomo,et al.  BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering , 2012, AAAI Fall Symposium: Information Retrieval and Knowledge Discovery in Biomedical Text.

[22]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[23]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.