BioDEAL: Biological data-evidence-annotation linkage system

The size of publication databases in biomedicine (e.g., PubMed, MEDLINE) are growing rapidly every year, as are public databases of experimental biological data and annotations derived from the data. Publications often contain evidence that confirms or disproves annotations such as putative protein functions, however, it is increasingly difficult for biologists to identify and process published evidence due to the volume of papers and the lack of a systematic approach to associate published evidence with experimental data and annotations. NLP tools can help address the growing divide by providing automatic high-throughput detection of simple terms in publication text. However, NLP tools are not mature enough to identify complex terms, relationships, or events. In this paper we present BioDEAL, a community evidence annotation system that introduces a feedback loop into the database-publication cycle to allow scientists to connect data-driven biological concepts to publications.

[1]  Miguel A. Andrade-Navarro,et al.  Ranking the whole MEDLINE database according to a large training set using text indexing , 2005, BMC Bioinformatics.

[2]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[3]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Chris Sander,et al.  Introducing meta-services for biomedical information extraction , 2008, Genome Biology.

[6]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[7]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[8]  Hagit Shatkay,et al.  New directions in biomedical text annotation: definitions, guidelines and corpus construction , 2006, BMC Bioinformatics.

[9]  Ulf Leser,et al.  What makes a gene name? Named entity recognition in the biomedical literature , 2005, Briefings Bioinform..

[10]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[11]  Robert D. Finn,et al.  The Distributed Annotation System for Integration of Biological Data , 2006, DILS.

[12]  A. Valencia,et al.  A text‐mining perspective on the requirements for electronically annotated abstracts , 2008, FEBS letters.

[13]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[14]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[15]  Fredrik Olsson,et al.  Protein names and how to find them , 2002, Int. J. Medical Informatics.