论文信息 - An enhanced CRF-based system for disease name entity recognition and normalization on BioCreative V DNER Task

An enhanced CRF-based system for disease name entity recognition and normalization on BioCreative V DNER Task

Disease plays a central role in many areas of biomedical research and healthcare. However, the rapid growth of disease and treatment research creates barriers to the knowledge aggregation of PubMed database. Thus, a framework of disease mention recognition and normalization has become increasingly important for biomedical text mining. In this work, we utilize conditional random fields (CRFs) to develop a recognition system and optimize the results by customizing several post-processing steps, such as abbreviation resolution and consistency improvement. At the DNER subtask of BioCreative V CDR task, the system performance of disease normalization is 0.8646 of F-measure, especially a high precision (0.8963) on the normalization task.

[1] Hung-Yu Kao,et al. Curatable Named-Entity Recognition Using Semantic Relations , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2] Zhiyong Lu,et al. Annotating chemicals , diseases and their interactions in biomedical literature , 2015 .

[3] K. Bretonnel Cohen,et al. MutationFinder: a high-performance system for extracting point mutation mentions from text , 2007, Bioinform..

[4] Craig MacDonald,et al. Inferring conceptual relationships to improve medical records search , 2013, OAIR.

[5] Thomas C. Wiegers,et al. MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database , 2012, Database J. Biol. Databases Curation.

[6] Graciela Gonzalez,et al. BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[7] Zhiyong Lu,et al. SR4GN: A Species Recognition Software Tool for Gene Normalization , 2012, PloS one.

[8] Goran Nenadic,et al. LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[9] Zhiyong Lu,et al. GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains , 2015, BioMed research international.

[10] Maurice H. T. Ling,et al. BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature , 2009, BMC Bioinformatics.

[11] Zhiyong Lu,et al. tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[12] Zhiyong Lu,et al. DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[13] Chun-Nan Hsu,et al. Integrating high dimensional bi-directional parsing models for gene mention tagging , 2008, ISMB.

[14] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15] Goran Nenadic,et al. The GNAT library for local and remote gene mention normalization , 2011, Bioinform..

[16] Zhiyong Lu,et al. NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[17] Zhiyong Lu,et al. tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[18] Zhiyong Lu,et al. PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.