NCBI at the 2014 BioASQ Challenge Task: Large-scale Biomedical Semantic Indexing and Question Answering

In this paper we report our participation in the 2014 BioASQ chal- lenge tasks on biomedical semantic indexing and question answering. For the biomedical semantic indexing task (Task 2a) where participating teams are pro- vided with PubMed articles and asked to return relevant MeSH terms, we built on our previous learning-to-rank framework with a special focus on systemati- cally incorporating results of complementary methods for improved perfor- mance. For the question answering task (Task 2b) where teams are provided with natural language questions and asked to return responses in the format of documents, snippets, concepts and RDF triplets (Phase A) and direct answers (Phase B), we relied on PubMed search engines and our state-of-the-art named entity recognition tools such as DNorm and tmVar in Phases A and B, respec- tively. The official challenge results demonstrate that we consistently per- formed better than the baseline approaches for Task 2a and Task 2b (Phase B), and ranked among the top tier systems in the 2014 challenge.

[1]  Zhiyong Lu,et al.  The gene normalization task in BioCreative III , 2011, BMC Bioinformatics.

[2]  Zhiyong Lu,et al.  Viewpoint Paper: Evaluating Relevance Ranking Strategies for MEDLINE Retrieval , 2009, J. Am. Medical Informatics Assoc..

[3]  Zhiyong Lu,et al.  Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts , 2012, Database J. Biol. Databases Curation.

[4]  Antonio Jimeno-Yepes,et al.  The NLM Medical Text Indexer System for Indexing Biomedical Literature , 2013, BioASQ@CLEF.

[5]  Zhiyong Lu,et al.  Evaluation of query expansion using MeSH in PubMed , 2009, Information Retrieval.

[6]  Hung-Yu Kao,et al.  Cross-species gene normalization by species inference , 2011, BMC Bioinformatics.

[7]  Zhiyong Lu,et al.  NCBI at 2013 ShARe/CLEF eHealth Shared Task: Disorder Normalization in Clinical Notes with Dnorm , 2013, CLEF.

[8]  Zhiyong Lu,et al.  SR4GN: A Species Recognition Software Tool for Gene Normalization , 2012, PloS one.

[9]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[10]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[11]  C. Arighi,et al.  The Gene Ontology Task at BioCreative IV , 2013 .

[12]  Zhiyong Lu,et al.  Recommending MeSH terms for annotating biomedical articles , 2011, J. Am. Medical Informatics Assoc..

[13]  Zhiyong Lu,et al.  NCBI at the 2013 BioASQ challenge task: Learning to rank for automatic MeSH indexing , 2013 .

[14]  Grigorios Tsoumakas,et al.  Large-Scale Semantic Indexing of Biomedical Publications , 2013, BioASQ@CLEF.

[15]  Zhiyong Lu,et al.  Learning to Annotate Scientific Publications , 2010, COLING.

[16]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[17]  Zhiyong Lu,et al.  Click-words: learning to predict document keywords from a user perspective , 2010, Bioinform..

[18]  Zhiyong Lu,et al.  tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[19]  Zhiyong Lu,et al.  Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information , 2012, Database J. Biol. Databases Curation.

[20]  Susanne M. Humphrey,et al.  A recent advance in the automatic indexing of the biomedical literature , 2009, J. Biomed. Informatics.

[21]  Ioannis Partalas,et al.  Results of the First BioASQ Workshop , 2013, BioASQ@CLEF.

[22]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[23]  Zhiyong Lu,et al.  Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases , 2011 .

[24]  Zhiyong Lu,et al.  BioCreative-IV virtual issue , 2014, Database J. Biol. Databases Curation.

[25]  Zhiyong Lu,et al.  NCBI at the BioCreative IV CHEMDNER Task : Recognizing chemical names in PubMed articles with tmChem , 2013 .

[26]  Zhiyong Lu,et al.  BioCreative-2012 Virtual Issue , 2012, Database J. Biol. Databases Curation.

[27]  W. John Wilbur,et al.  Automatic MeSH term assignment and quality assessment , 2001, AMIA.

[28]  Zhiyong Lu,et al.  Author keywords in biomedical journal articles. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[29]  Zhiyong Lu,et al.  - like interactive curation system for document triage and literature curation , 2012 .

[30]  Georgios Paliouras,et al.  Evaluation measures for hierarchical classification: a unified view and novel approaches , 2013, Data Mining and Knowledge Discovery.