论文信息 - Linked annotations: a middle ground for manual curation of biomedical databases and text corpora

Linked annotations: a middle ground for manual curation of biomedical databases and text corpora

Summary Annotators of text corpora and biomedical databases carry out the same labor-intensive task to manually extract structured data from unstructured text. Tasks are needlessly repeated because text corpora are widely scattered. We envision that a linked annotation resource unifying many corpora could be a game changer. Such an open forum will help focus on novel annotations and on optimally benefiting from the energy of many experts. As proof-of-concept, we annotated protein subcellular localization in 100 abstracts cited by UniProtKB. The detailed comparison between our new corpus and the original UniProtKB annotations revealed sustained novel annotations for 42% of the entries (proteins). In a unified linked annotation resource these could immediately extend the utility of text corpora beyond the textmining community. Our example motivates the central idea that linked annotations from text corpora can complement database annotations. Background The natural language processing (NLP) and biomedical research communities have in common that they invest great effort into making high-quality manual annotation of biomedical literature. The focus and the annotation strategies of the two communities have, however, differed so much that collaborations remained stunningly limited. Most text corpora contain detailed markup of only a few types of entities and relationships

Burkhard Rost | Lars Juhl Jensen | Juan Miguel Cejuela | Tatyana Goldberg | Shrikant Vinchurkar

[1] Paloma Martínez,et al. An analysis on the entity annotations in biological corpora , 2014, F1000Research.

[2] M. Ashburner,et al. Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3] María Martín,et al. Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[4] Michael Kuhn,et al. Reflect: augmented browsing for the life scientist , 2009, Nature Biotechnology.

[5] Burkhard Rost,et al. tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles , 2014, Database J. Biol. Databases Curation.

[6] Seán I O'Donoghue,et al. Reflect: augmented browsing for the life scientist , 2009, Nature Biotechnology.

[7] K. Bretonnel Cohen,et al. Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[8] K. Bretonnel Cohen,et al. A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools , 2012, BMC Bioinformatics.

[9] Burkhard Rost,et al. Linked annotations: a middle ground for manual curation of biomedical databases and text corpora , 2015 .

[10] Klaus Palme,et al. A cysteine-rich receptor-like kinase NCRK and a pathogen-induced protein kinase RBK1 are Rop GTPase interactors. , 2007, The Plant journal : for cell and molecular biology.