Today's biomedical research has become heavily dependent on the access to biological knowledge encoded in expert curated biological databases (e.g. Swiss-Prot). As the volume of biological literature grows rapidly, it becomes increasingly difficult for human curators to keep up with the literature because manual curation is an expensive and time-consuming endeavor. Past research has shown that (semi-)automated approach has the potential to greatly improve the manual curation productivity [1-3]. We recently developed PubTator, a web-based application for assisting literature curation through the use of various text mining tools [4-6]. PubTator has several unique features. First, PubTator is a web-based system, thus no installation is required and not restricted to any specific computer platforms. That is, it works on different computing platforms as long as there is a Web browser installed. Second, PubTator features a PubMed-like interface which many human curators find it to be familiar and easy to use with minimal training required. Third, PubTator integrates multiple competition-winning text mining approaches that we recently developed for recognizing important biological entities: Gene/Proteins, Diseases, Mutations, Chemical/Drugs, and Organisms [7-11]. Hence, it can guarantee the state-of-the-art performance on text-mined results. Lastly, PubTator is in sync with PubMed content through nightly updates. Interested users can access our text-mined results via a) PubTator web interface, b) RESTful API or c) ftp download. We have conducted a formal text-mining aided curation experiment, results of which showed that PubTator was able to greatly improve both the curation efficiency and accuracy [6]. More recently, PubTator has been successfully deployed in practice for the curation of CDC's human genome epidemiology knowledge-base. Hence, we conclude that our text-mining tools and PubTator can provide practical benefits to literature curation in bioinformatics research. PubTator is freely available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/
[1]
Joel D. Martin,et al.
PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
,
2003,
BMC Bioinformatics.
[2]
Zhiyong Lu,et al.
PubTator: a web-based text mining tool for assisting biocuration
,
2013,
Nucleic Acids Res..
[3]
김삼묘,et al.
“Bioinformatics” 특집을 내면서
,
2000
.
[4]
Hung-Yu Kao,et al.
Cross-species gene normalization by species inference
,
2011,
BMC Bioinformatics.
[5]
Beatrice Alex,et al.
Assisted Curation: Does Text Mining Really Help?
,
2007,
Pacific Symposium on Biocomputing.
[6]
Zhiyong Lu,et al.
tmVar: a text mining approach for extracting sequence variants in biomedical literature
,
2013,
Bioinform..
[7]
Zhiyong Lu,et al.
SR4GN: A Species Recognition Software Tool for Gene Normalization
,
2012,
PloS one.
[8]
Zhiyong Lu,et al.
Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts
,
2012,
Database J. Biol. Databases Curation.
[9]
Zhiyong Lu,et al.
- like interactive curation system for document triage and literature curation
,
2012
.
[10]
Zhiyong Lu,et al.
DNorm: disease name normalization with pairwise learning to rank
,
2013,
Bioinform..
[11]
Zhiyong Lu,et al.
NCBI at the BioCreative IV CHEMDNER Task : Recognizing chemical names in PubMed articles with tmChem
,
2013
.
[12]
Kimberly Van Auken,et al.
Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation
,
2009,
BMC Bioinformatics.