Learning to Answer Biomedical Questions: OAQA at BioASQ 4B

This paper describes the OAQA system evaluated in the BioASQ 4B Question Answering track. The system extends the Yang et al. (2015) system and integrates additional biomedical and generalpurpose NLP annotators, machine learning modules for search result scoring, collective answer reranking, and yes/no answer prediction. We first present the overall architecture of the system, and then focus on describing the main extensions to the Yang et al. (2015) approach. Before the official evaluation, we used the development dataset (excluding the 3B Batch 5 subset) for training. We present initial evaluation results on a subset of the development data set to demonstrate the effectiveness of the proposed new methods, and focus on performance analysis of yes/no question answering.

[1]  Arthur C. Ciccolo,et al.  Towards the Open Advancement of Question Answering Systems December 2008 , 2009 .

[2]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[3]  Eric Nyberg,et al.  CSE Framework: A UIMA-based Distributed System for Configuration Space Exploration , 2013, UIMA@GSCL.

[4]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[5]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[6]  Michael Schroeder,et al.  Answering Factoid Questions in the Biomedical Domain , 2013, BioASQ@CLEF.

[7]  Eric Nyberg,et al.  Building optimal information systems automatically: configuration space exploration for biomedical information systems , 2013, CIKM.

[8]  Zhiyong Lu,et al.  Beyond accuracy: creating interoperable and scalable text-mining web services , 2016, Bioinform..

[9]  Yusuke Miyao,et al.  Answering Yes/No Questions via Question Inversion , 2012, COLING.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[12]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[13]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[14]  K. Bretonnel Cohen,et al.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools , 2012, BMC Bioinformatics.

[15]  Chi Zhang,et al.  Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B , 2015, CLEF.

[16]  Martha Palmer,et al.  Getting the Most out of Transition-based Dependency Parsing , 2011, ACL.

[17]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[18]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[19]  Katerina T. Frantzi,et al.  Automatic recognition of multi-word terms , 1998 .

[20]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.