Annotating the human genome with Disease Ontology

BackgroundThe human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.ResultsWe used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations.ConclusionThe validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.

[1]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[2]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[5]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[6]  Hagit Shatkay,et al.  Mining the Biomedical Literature in the Genomic Era: An Overview , 2003, J. Comput. Biol..

[7]  M. Rivera,et al.  Analysis of genomic and proteomic data using advanced literature mining. , 2003, Journal of proteome research.

[8]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[9]  John N. Weinstein,et al.  Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics , 2004, BMC Bioinformatics.

[10]  Guy Divita,et al.  Failure Analysis of MetaMap Transfer (MMTx) , 2004, MedInfo.

[11]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[12]  Padmini Srinivasan,et al.  Mining MEDLINE for implicit links between dietary substances and diseases , 2004, ISMB/ECCB.

[13]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[14]  Itamar Simon,et al.  MILANO – custom annotation of microarray results using automatic literature searches , 2005, BMC Bioinformatics.

[15]  Michael W. Berry,et al.  Gene clustering by Latent Semantic Indexing of MEDLINE abstracts , 2005, Bioinform..

[16]  P. Bork,et al.  G2D: a tool for mining genes associated with disease , 2005, BMC Genetics.

[17]  Karina Gibert,et al.  Inherited disorder phenotypes: controlled annotation and statistical analysis for knowledge mining from gene lists , 2005, BMC Bioinformatics.

[18]  Peter J. Haug,et al.  Comparing Natural Language Processing Tools to Extract Medical Problems from Narrative Text , 2005, AMIA.

[19]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[20]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Francesco Pinciroli,et al.  GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists , 2005, Nucleic Acids Res..

[22]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[23]  W. Kibbe,et al.  Other riffs on cooperation are already showing how well a wiki could work , 2007, Nature.

[24]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.