The Fudan-UIUC Participation in the BioASQ Challenge Task 2a: The Antinomyra system

This paper describes the Antinomyra System that participated in the BioASQ Task 2a Challenge for the large-scale biomedical semantic indexing. The system can automatically annotate MeSH terms for MEDLINE citations using only title and abstract information. With respect to the official test set (batch 3, week 5), based on 1867 annotated citations out of all 4533 citations (June 6, 2014), our best submission achieved 0.6199 in flat Micro F-measure. This is 9.8% higher than the performance of official NLM solution Medical Text Indexer (MTI), which achieved 0.5647 in flat F-measure.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Stuart J. Nelson,et al.  The MeSH Translation Maintenance System: Structure, Interface Design, and Implementation , 2004, MedInfo.

[3]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[4]  ChengXiang Zhai,et al.  An empirical study of tokenization strategies for biomedical information retrieval , 2007, Information Retrieval.

[5]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[6]  Jia Zeng,et al.  Field independent probabilistic model for clustering multi-field documents , 2009, Inf. Process. Manag..

[7]  Zhiyong Lu,et al.  NCBI at the 2013 BioASQ challenge task: Learning to rank for automatic MeSH indexing , 2013 .

[8]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[9]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[10]  Wei Yuan,et al.  Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization , 2011, Inf. Sci..

[11]  Johannes M Freudenberg,et al.  Mining emerging biomedical literature for understanding disease associations in drug discovery. , 2014, Methods in molecular biology.

[12]  Ioannis Partalas,et al.  Results of the First BioASQ Workshop , 2013, BioASQ@CLEF.

[13]  Jia Zeng,et al.  Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity , 2009, Bioinform..

[14]  Lefteris Angelis,et al.  MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms , 2011, J. Biomed. Informatics.

[15]  Antonio Jimeno-Yepes,et al.  Comparison and combination of several MeSH indexing approaches , 2013, AMIA.

[16]  Jun Gu,et al.  Efficient Semisupervised MEDLINE Document Clustering With MeSH-Semantic and Global-Content Constraints , 2013, IEEE Transactions on Cybernetics.

[17]  Grigorios Tsoumakas,et al.  Large-Scale Semantic Indexing of Biomedical Publications , 2013, BioASQ@CLEF.

[18]  Zhiyong Lu,et al.  Recommending MeSH terms for annotating biomedical articles , 2011, J. Am. Medical Informatics Assoc..

[19]  Antonio Jimeno-Yepes,et al.  The NLM Medical Text Indexer System for Indexing Biomedical Literature , 2013, BioASQ@CLEF.