ezTag: tagging biomedical concepts via interactive learning

Abstract Recently, advanced text-mining techniques have been shown to speed up manual data curation by providing human annotators with automated pre-annotations generated by rules or machine learning models. Due to the limited training data available, however, current annotation systems primarily focus only on common concept types such as genes or diseases. To support annotating a wide variety of biological concepts with or without pre-existing training data, we developed ezTag, a web-based annotation tool that allows curators to perform annotation and provide training data with humans in the loop. ezTag supports both abstracts in PubMed and full-text articles in PubMed Central. It also provides lexicon-based concept tagging as well as the state-of-the-art pre-trained taggers such as TaggerOne, GNormPlus and tmVar. ezTag is freely available at http://eztag.bioqrator.org.

[1]  Lars Juhl Jensen,et al.  EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation , 2016, Database J. Biol. Databases Curation.

[2]  Zhiyong Lu,et al.  Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine , 2016, PLoS Comput. Biol..

[3]  Mariana L. Neves,et al.  A survey on annotation tools for the biomedical literature , 2014, Briefings Bioinform..

[4]  Zhiyong Lu,et al.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[5]  Christian Biemann,et al.  An adaptive annotation approach for biomedical entity and relation recognition , 2016, Brain Informatics.

[6]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[7]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[8]  José Luís Oliveira,et al.  Egas: a collaborative and interactive document curation platform , 2014, Database J. Biol. Databases Curation.

[9]  Naomie Salim,et al.  Chemical named entities recognition: a review on approaches and applications , 2014, Journal of Cheminformatics.

[10]  Ulf Leser,et al.  What makes a gene name? Named entity recognition in the biomedical literature , 2005, Briefings Bioinform..

[11]  Karin M. Verspoor,et al.  BioC: a minimalist approach to interoperability for biomedical text processing , 2013, AMIA.

[12]  Zhiyong Lu,et al.  GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains , 2015, BioMed research international.

[13]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[14]  Sankardeep Chakraborty,et al.  Advanced Data Structures , 2019 .

[15]  W. John Wilbur,et al.  BioC viewer: a web-based tool for displaying and merging annotations in BioC , 2016, Database J. Biol. Databases Curation.

[16]  Alfonso Valencia,et al.  MyMiner: a web application for computer-assisted biocuration and text annotation , 2012, Bioinform..

[17]  W. John Wilbur,et al.  Assisting manual literature curation for protein–protein interactions using BioQRator , 2014, Database J. Biol. Databases Curation.

[18]  José Luís Oliveira,et al.  Biomedical Named Entity Recognition: A Survey of Machine-Learning Tools , 2012 .

[19]  Zhiyong Lu,et al.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction , 2016, Database J. Biol. Databases Curation.

[20]  Sophia Ananiadou,et al.  Text-mining-assisted biocuration workflows in Argo , 2014, Database J. Biol. Databases Curation.

[21]  Zhiyong Lu,et al.  tmVar 2.0: integrating genomic variant information from literature with dbSNP and ClinVar for precision medicine , 2018, Bioinform..

[22]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[23]  Zhiyong Lu,et al.  On expert curation and scalability: UniProtKB/Swiss-Prot as a case study , 2017, Bioinform..

[24]  Burkhard Rost,et al.  tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles , 2014, Database J. Biol. Databases Curation.

[25]  Zhiyong Lu,et al.  SR4GN: A Species Recognition Software Tool for Gene Normalization , 2012, PloS one.

[26]  Fabio Rinaldi,et al.  OntoGene web services for biomedical text mining , 2014, BMC Bioinformatics.