论文信息 - Text mining tools for assisting literature curation

Text mining tools for assisting literature curation

Today's biomedical research has become heavily dependent on the access to biological knowledge encoded in expert curated biological databases (e.g. Swiss-Prot). As the volume of biological literature grows rapidly, it becomes increasingly difficult for human curators to keep up with the literature because manual curation is an expensive and time-consuming endeavor. Past research has shown that (semi-)automated approach has the potential to greatly improve the manual curation productivity [1-3]. We recently developed PubTator, a web-based application for assisting literature curation through the use of various text mining tools [4-6]. PubTator has several unique features. First, PubTator is a web-based system, thus no installation is required and not restricted to any specific computer platforms. That is, it works on different computing platforms as long as there is a Web browser installed. Second, PubTator features a PubMed-like interface which many human curators find it to be familiar and easy to use with minimal training required. Third, PubTator integrates multiple competition-winning text mining approaches that we recently developed for recognizing important biological entities: Gene/Proteins, Diseases, Mutations, Chemical/Drugs, and Organisms [7-11]. Hence, it can guarantee the state-of-the-art performance on text-mined results. Lastly, PubTator is in sync with PubMed content through nightly updates. Interested users can access our text-mined results via a) PubTator web interface, b) RESTful API or c) ftp download. We have conducted a formal text-mining aided curation experiment, results of which showed that PubTator was able to greatly improve both the curation efficiency and accuracy [6]. More recently, PubTator has been successfully deployed in practice for the curation of CDC's human genome epidemiology knowledge-base. Hence, we conclude that our text-mining tools and PubTator can provide practical benefits to literature curation in bioinformatics research. PubTator is freely available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

Zhiyong Lu | Hung-Yu Kao | Chih-Hsuan Wei

[1] Joel D. Martin,et al. PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[2] Zhiyong Lu,et al. PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[3] 김삼묘,et al. “Bioinformatics” 특집을 내면서 , 2000 .

[4] Hung-Yu Kao,et al. Cross-species gene normalization by species inference , 2011, BMC Bioinformatics.

[5] Beatrice Alex,et al. Assisted Curation: Does Text Mining Really Help? , 2007, Pacific Symposium on Biocomputing.

[6] Zhiyong Lu,et al. tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[7] Zhiyong Lu,et al. SR4GN: A Species Recognition Software Tool for Gene Normalization , 2012, PloS one.

[8] Zhiyong Lu,et al. Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts , 2012, Database J. Biol. Databases Curation.

[9] Zhiyong Lu,et al. - like interactive curation system for document triage and literature curation , 2012 .

[10] Zhiyong Lu,et al. DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[11] Zhiyong Lu,et al. NCBI at the BioCreative IV CHEMDNER Task : Recognizing chemical names in PubMed articles with tmChem , 2013 .

[12] Kimberly Van Auken,et al. Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation , 2009, BMC Bioinformatics.