Tackling Biomedical Text Summarization: OAQA at BioASQ 5B

In this paper, we describe our participation in phase B of task 5b of the fifth edition of the annual BioASQ challenge, which includes answering factoid, list, yes-no and summary questions from biomedical data. We describe our techniques with an emphasis on ideal answer generation, where the goal is to produce a relevant, precise, non-redundant, query-oriented summary from multiple relevant documents. We make use of extractive summarization techniques to address this task and experiment with different biomedical ontologies and various algorithms including agglomerative clustering, Maximum Marginal Relevance (MMR) and sentence compression. We propose a novel word embedding based tf-idf similarity metric and a soft positional constraint which improve our system performance. We evaluate our techniques on test batch 4 from the fourth edition of the challenge. Our best system achieves a ROUGE-2 score of 0.6534 and ROUGE-SU4 score of 0.6536.

[1]  P L Schuyler,et al.  The UMLS Metathesaurus: representing different views of biomedical concepts. , 1993, Bulletin of the Medical Library Association.

[2]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[3]  Sanda M. Harabagiu,et al.  LCC Tools for Question Answering , 2002, TREC.

[4]  Daniel Marcu,et al.  A Noisy-Channel Approach to Question Answering , 2003, ACL.

[5]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[6]  Ping Chen,et al.  A Query-Based Medical Information Summarization System Using Ontology Knowledge , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[7]  Yang Wang,et al.  Question Answering Summarization of Multiple Biomedical Documents , 2007, Canadian Conference on AI.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Michael Schroeder,et al.  Answering Factoid Questions in the Biomedical Domain , 2013, BioASQ@CLEF.

[10]  Chi Zhang,et al.  Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B , 2015, CLEF.

[11]  Manoj Kumar Chinnakotla,et al.  IIITH at BioASQ Challange 2015 Task 3b: Bio-Medical Question Answering System , 2015, CLEF.

[12]  Lukasz Kaiser,et al.  Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[13]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[14]  Eric Nyberg,et al.  Learning to Answer Biomedical Questions: OAQA at BioASQ 4B , 2016 .

[15]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.