Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice

The growing available genomic information provides new opportunities for novel research approaches and original biomedical applications that can provide effective data management and analysis support. In fact, integration and comprehensive evaluation of available controlled data can highlight information patterns leading to unveil new biomedical knowledge. Here, we describe Genome Function INtegrated Discover (GFINDer ), a Web-accessible three-tier multidatabase system we developed to automatically enrich lists of user-classified genes with several functional and phenotypic controlled annotations, and to statistically evaluate them in order to identify annotation categories significantly over- or underrepresented in each considered gene class. Genomic controlled annotations from Gene Ontology (GO), KEGG, Pfam, InterPro, and online mendelian Inheritance in Man (OMIM) were integrated in GFINDer and several categorical tests were implemented for their analysis. A controlled vocabulary of inherited disorder phenotypes was obtained by normalizing and hierarchically structuring disease accompanying signs and symptoms from OMIM clinical synopsis sections. GFINDer modular architecture is well suited for further system expansion and for sustaining increasing workload. Testing results showed that GFINDer analyses can highlight gene functional and phenotypic characteristics and differences, demonstrating its value in supporting genomic biomedical approaches aiming at understanding the complex biomolecular mechanisms underlying patho-physiological phenotypes, and in helping the transfer of genomic results to medical practice.

[1]  F. Collins,et al.  Implications of the Human Genome Project for medical science. , 2001, JAMA.

[2]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[3]  R. Tanzi,et al.  The genetic epidemiology of neurodegenerative disease. , 2005, The Journal of clinical investigation.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Ruian Xu,et al.  Diabetes gene therapy: potential and challenges. , 2003, Current gene therapy.

[6]  C. Broeckhoven,et al.  Transcriptional regulation of Alzheimer's disease genes: implications for susceptibility. , 2000, Human molecular genetics.

[7]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[8]  P. Bork,et al.  Bioinformatics in the post-sequence era , 2003, Nature Genetics.

[9]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[10]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[11]  John R. Hubbard,et al.  Data structures with Java , 2000 .

[12]  David L. Steffen,et al.  The Breast Cancer Gene Database: a collaborative information resource , 1999, Oncogene.

[13]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[14]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[17]  Miguel García-Remesal,et al.  ONTOFUSION: Ontology-based integration of genomic and clinical databases , 2006, Comput. Biol. Medicine.

[18]  F B ROGERS,et al.  Medical Subject Headings , 1948, Nature.

[19]  Joaquín Dopazo,et al.  BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments , 2005, Nucleic Acids Res..

[20]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[21]  Purvesh Khatri,et al.  Recent additions and improvements to the Onto-Tools , 2005, Nucleic Acids Res..

[22]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2006, Nucleic Acids Res..

[23]  C. V. Jongeneel,et al.  eVOC: a controlled vocabulary for unifying gene expression data. , 2003, Genome research.

[24]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[25]  David L. Steffen,et al.  Digital reviews in molecular biology: approaches to structured digital publication , 2000, Bioinform..

[26]  L. Brooke The National Library of Medicine. , 1980, Hospital libraries.

[27]  C. Lindberg The Unified Medical Language System (UMLS) of the National Library of Medicine. , 1990, Journal.

[28]  Carole A. Goble,et al.  Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..

[29]  Victor Maojo,et al.  Viewpoint Paper: Bioinformatics and Medical Informatics: Collaborations on the Road to Genomic Medicine? , 2003, J. Am. Medical Informatics Assoc..

[30]  R. Côté Systematized Nomenclature of Medicine , 1979 .

[31]  V. McKusick Mendelian inheritance in man , 1971 .

[32]  Gary G. Koch,et al.  Categorical data analysis using the sas® system, 2nd edition , 2000 .

[33]  Ernest Adeghate,et al.  Molecular and cellular basis of the aetiology and management of diabetic cardiomyopathy: A short review , 2004, Molecular and Cellular Biochemistry.

[34]  A. Chinnaiyan,et al.  Bioinformatics Strategies for Translating Genome‐Wide Expression Analyses into Clinically Useful Cancer Markers , 2004, Annals of the New York Academy of Sciences.

[35]  Ronnie Driver,et al.  Biostatistics: a Methodology for the Health Sciences , 2005 .

[36]  Gary G. Koch,et al.  Categorical Data Analysis Using The SAS1 System , 1995 .

[37]  R. Nussbaum,et al.  Alzheimer's disease and Parkinson's disease. , 2003, The New England journal of medicine.

[38]  Michael Y. Galperin The Molecular Biology Database Collection: 2006 update , 2005, Nucleic Acids Res..

[39]  Ian Witten,et al.  Data Mining , 2000 .

[40]  Steffen Schulze-Kremer,et al.  Ontologies for molecular biology and bioinformatics , 2002, Silico Biol..