GenesTrace: Phenomic Knowledge Discovery via Structured Terminology

The era of applied genomic medicine is quickly approaching accompanied by the increasing availability of detailed genetic information. Understanding the genetic etiology behind complex, multi-gene diseases remains an important challenge. In order to uncover the putative genetic etiology of complex diseases, we designed a method that explores the relationships between two major terminological and ontological resources: the Unified Medical Language System (UMLS) and the Gene Ontology (GO). The UMLS has a mainly clinical emphasis; Gene Ontology has become the standard for biological annotations of genes and gene products. Using statistical and semantic relationships within and between the two resources, we are able to infer relationships between disease concepts in the UMLS and gene products annotated using GO and its associated databases. We validated our inferences by comparing them to the known gene-disease relationships, as defined in the Online Mendelian Inheritance in Man's morbidmap (OMIM). The proof-of-concept methods presented here are unique in that they bypass the ambiguity of the direct extraction of gene or disease term from MEDLINE. Additionally, our methods provide direct links to clinically significant diseases through established terminologies or ontologies. The preliminary results presented here indicate the potential utility of exploiting the existing, manually curated relationships in biomedical resources as a tool for the discovery of potentially valuable new gene-disease relationships.

[1]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[2]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[3]  M. S. Blois Information holds medicine together. , 1987, M.D. computing : computers in medical practice.

[4]  Blois Ms Information holds medicine together. , 1987 .

[5]  M. Rivera,et al.  Analysis of genomic and proteomic data using advanced literature mining. , 2003, Journal of proteome research.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[8]  Miguel A. Andrade-Navarro,et al.  Gene annotation from scientific literature using mappings between keyword systems , 2004, Bioinform..

[9]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[10]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[11]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[12]  Olivier Bodenreider,et al.  Circular hierarchical relationships in the UMLS: etiology, diagnosis, treatment, complications and prevention , 2001, AMIA.

[13]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.