Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles

Background Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global “omic” scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided. Methodology/Principal Findings Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families. Conclusions/Significance The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations. The data are available free online at http://bioinfow.dep.usal.es/coexpression/.

[1]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[2]  Katrin Hoffmann,et al.  Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR – how well do they correlate? , 2005, BMC Genomics.

[3]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[4]  N. Barkai,et al.  A genetic signature of interspecies variations in gene expression , 2006, Nature Genetics.

[5]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[6]  Tom Maniatis,et al.  Assembly of a Functional Beta Interferon Enhanceosome Is Dependent on ATF-2–c-jun Heterodimer Orientation , 2000, Molecular and Cellular Biology.

[7]  Yonghong Wang,et al.  Characterization of mismatch and high-signal intensity probes associated with Affymetrix genechips , 2007, Bioinform..

[8]  P. Galéra,et al.  SP3/SP1 Transcription Activity Regulates Specific Expression of Collagen Type X in Hypertrophic Chondrocytes* , 2005, Journal of Biological Chemistry.

[9]  G. Stephanopoulos,et al.  A compendium of gene expression in normal human tissues. , 2001, Physiological genomics.

[10]  Wei-Min Liu,et al.  Analysis of high density expression microarrays with signed-rank call algorithms , 2002, Bioinform..

[11]  Martin Ester,et al.  Assessment and integration of publicly available SAGE, cDNA microarray, and oligonucleotide microarray expression data for global coexpression analyses. , 2005, Genomics.

[12]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[13]  Carlos Prieto,et al.  Algorithm to find gene expression profiles of deregulation and identify families of disease-altered genes , 2006, Bioinform..

[14]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[15]  Kiyoko F. Aoki-Kinoshita,et al.  Gene annotation and pathway mapping in KEGG. , 2007, Methods in molecular biology.

[16]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[17]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[18]  Sangsoo Kim,et al.  Gene expression Differential coexpression analysis using microarray data and its application to human cancer , 2005 .

[19]  Yudi Pawitan,et al.  Filtering genes to improve sensitivity in oligonucleotide microarray data analysis. , 2007, Nucleic acids research.

[20]  F. Mallein-Gerin,et al.  Interleukin-6 (IL-6) and/or Soluble IL-6 Receptor Down-regulation of Human Type II Collagen Gene Expression in Articular Chondrocytes Requires a Decrease of Sp1·Sp3 Ratio and of the Binding Activity of Both Factors to the COL2A1 Promoter* , 2008, Journal of Biological Chemistry.

[21]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[22]  S. Harrison,et al.  Crystal structure of ATF-2/c-Jun and IRF-3 bound to the interferon-beta enhancer. , 2004, The EMBO journal.

[23]  B. Snel,et al.  The yeast coexpression network has a small‐world, scale‐free architecture and can be explained by a simple model , 2004, EMBO reports.

[24]  J N Suojanen False false positive rates. , 1999, The New England journal of medicine.

[25]  Paul M. Magwene,et al.  Estimating genomic coexpression networks using first-order conditional independence , 2004, Genome Biology.

[26]  Michael G. Barnes,et al.  Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms , 2005, Nucleic acids research.

[27]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[28]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[29]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.

[30]  S. Jimenez,et al.  Human Collagen Krox Up-regulates Type I Collagen Expression in Normal and Scleroderma Fibroblasts through Interaction with Sp1 and Sp3 Transcription Factors* , 2007, Journal of Biological Chemistry.

[31]  Fionn Murtagh,et al.  Multidimensional clustering algorithms , 1985 .

[32]  Tom Maniatis,et al.  Crystal structure of ATF‐2/c‐Jun and IRF‐3 bound to the interferon‐β enhancer , 2004 .

[33]  Tze-Wey Loong,et al.  Understanding sensitivity and specificity with the right side of the brain , 2003, BMJ : British Medical Journal.

[34]  Gary D. Stormo,et al.  PAP: a comprehensive workbench for mammalian transcriptional regulatory sequence analysis , 2007, Nucleic Acids Res..