Text mining tools for assisting literature curation

Today's biomedical research has become heavily dependent on the access to biological knowledge encoded in expert curated biological databases (e.g. Swiss-Prot). As the volume of biological literature grows rapidly, it becomes increasingly difficult for human curators to keep up with the literature because manual curation is an expensive and time-consuming endeavor. Past research has shown that (semi-)automated approach has the potential to greatly improve the manual curation productivity [1-3]. We recently developed PubTator, a web-based application for assisting literature curation through the use of various text mining tools [4-6]. PubTator has several unique features. First, PubTator is a web-based system, thus no installation is required and not restricted to any specific computer platforms. That is, it works on different computing platforms as long as there is a Web browser installed. Second, PubTator features a PubMed-like interface which many human curators find it to be familiar and easy to use with minimal training required. Third, PubTator integrates multiple competition-winning text mining approaches that we recently developed for recognizing important biological entities: Gene/Proteins, Diseases, Mutations, Chemical/Drugs, and Organisms [7-11]. Hence, it can guarantee the state-of-the-art performance on text-mined results. Lastly, PubTator is in sync with PubMed content through nightly updates. Interested users can access our text-mined results via a) PubTator web interface, b) RESTful API or c) ftp download. We have conducted a formal text-mining aided curation experiment, results of which showed that PubTator was able to greatly improve both the curation efficiency and accuracy [6]. More recently, PubTator has been successfully deployed in practice for the curation of CDC's human genome epidemiology knowledge-base. Hence, we conclude that our text-mining tools and PubTator can provide practical benefits to literature curation in bioinformatics research. PubTator is freely available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

[1]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[2]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[3]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[4]  Hung-Yu Kao,et al.  Cross-species gene normalization by species inference , 2011, BMC Bioinformatics.

[5]  Beatrice Alex,et al.  Assisted Curation: Does Text Mining Really Help? , 2007, Pacific Symposium on Biocomputing.

[6]  Zhiyong Lu,et al.  tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[7]  Zhiyong Lu,et al.  SR4GN: A Species Recognition Software Tool for Gene Normalization , 2012, PloS one.

[8]  Zhiyong Lu,et al.  Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts , 2012, Database J. Biol. Databases Curation.

[9]  Zhiyong Lu,et al.  - like interactive curation system for document triage and literature curation , 2012 .

[10]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[11]  Zhiyong Lu,et al.  NCBI at the BioCreative IV CHEMDNER Task : Recognizing chemical names in PubMed articles with tmChem , 2013 .

[12]  Kimberly Van Auken,et al.  Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation , 2009, BMC Bioinformatics.