Functional Annotation and Identification of Candidate Disease Genes by Computational Analysis of Normal Tissue Gene Expression Data

Background High-throughput gene expression data can predict gene function through the “guilt by association” principle: coexpressed genes are likely to be functionally associated. Methodology/Principal Findings We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE) and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG), small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin. Conclusions/Significance We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several genetic diseases of unknown molecular basis.

[1]  F. Vogel,et al.  Localization of a gene for the human low-voltage EEG on 20q and genetic heterogeneity. , 1992, Genomics.

[2]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[3]  D. Botstein,et al.  Exploring the new world of the genome with DNA microarrays , 1999, Nature Genetics.

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[7]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[8]  A. Chapelle,et al.  Thirty distinct CACNA1F mutations in 33 families with incomplete type of XLCSNB and Cacna1f expression profiling in mouse retina , 2002, European Journal of Human Genetics.

[9]  A. Hunter,et al.  Lathosterolosis: an inborn error of human and murine cholesterol synthesis due to lathosterol 5-desaturase deficiency. , 2003, Human molecular genetics.

[10]  T. Cooper,et al.  Pre-mRNA splicing and human disease. , 2003, Genes & development.

[11]  Larry Donoso,et al.  Identification of GUCY2D gene mutations in CORD5 families and evidence of incomplete penetrance , 2003, Human mutation.

[12]  Jacques van Helden,et al.  Metrics for comparing regulatory sequences on the basis of pattern counts , 2004, Bioinform..

[13]  B. Frey,et al.  The functional landscape of mouse gene expression , 2004, Journal of biology.

[14]  A. Latos-Bieleńska,et al.  Novel amino acid substitution in the Y‐position of collagen type II causes spondyloepimetaphyseal dysplasia congenita , 2005, American journal of medical genetics. Part A.

[15]  D M Hunt,et al.  A detailed study of the phenotype of an autosomal dominant cone-rod dystrophy (CORD7) associated with mutation in the gene for RIM1 , 2005, British Journal of Ophthalmology.

[16]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[17]  M. Raponi,et al.  Synonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Martin Ester,et al.  Assessment and integration of publicly available SAGE, cDNA microarray, and oligonucleotide microarray expression data for global coexpression analyses. , 2005, Genomics.

[19]  P. Hevezi,et al.  Gene expression analyses reveal molecular relationships among 20 regions of the human CNS , 2006, Neurogenetics.

[20]  Robert L. Strausberg,et al.  Cancer Genome Anatomy Project , 2006 .

[21]  M. Lathrop,et al.  Mutations in a new cytochrome P450 gene in lamellar ichthyosis type 3. , 2006, Human molecular genetics.

[22]  J. Beckmann,et al.  Myotilin is not the Causative Gene for Vocal Cord and Pharyngeal Weakness with Distal Myopathy (VCPDM) , 2006, Annals of human genetics.

[23]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[24]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[25]  Andreas Prlic,et al.  Ensembl 2007 , 2006, Nucleic Acids Res..