Is My Network Module Preserved and Reproducible?

In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation.

[1]  John Doyle,et al.  Module-Based Analysis of Robustness Tradeoffs in the Heat Shock Response System , 2006, PLoS Comput. Biol..

[2]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[3]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[4]  Johanna S. Hardin,et al.  A robust measure of correlation between two genes on a microarray , 2007, BMC Bioinformatics.

[5]  M. Berridge,et al.  Inositol trisphosphate and calcium signalling mechanisms. , 2009, Biochimica et biophysica acta.

[6]  A. Loraine,et al.  Transcriptional Coordination of the Metabolic Network in Arabidopsis1[W][OA] , 2006, Plant Physiology.

[7]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[8]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[9]  Jun Dong,et al.  Geometric Interpretation of Gene Coexpression Network Analysis , 2008, PLoS Comput. Biol..

[10]  M. Ruat,et al.  Sonic Hedgehog signaling in the mammalian brain , 2010, Journal of neurochemistry.

[11]  Richard C. Dubes,et al.  Cluster validity profiles , 1982, Pattern Recognit..

[12]  Timothy B Sackton,et al.  A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees , 2005, PLoS biology.

[13]  Alistair Rogers,et al.  Connecting genes, coexpression modules, and molecular signatures to environmental stress phenotypes in plants , 2008, BMC Systems Biology.

[14]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[15]  Berend Snel,et al.  Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes. , 2004, Nucleic acids research.

[16]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Amy V Kapp,et al.  Are clusters found in one dataset present in another dataset? , 2007, Biostatistics.

[18]  Michael E. Greenberg,et al.  From Synapse to Nucleus: Calcium-Dependent Gene Transcription in the Control of Synapse Development and Function , 2008, Neuron.

[19]  Robert Tibshirani,et al.  Cluster Validation by Prediction Strength , 2005 .

[20]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[21]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[22]  Guoping Fan,et al.  Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells , 2009, BMC Genomics.

[23]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Haifeng Li,et al.  Systematic discovery of functional modules and context-specific functional annotation of human genome , 2007, ISMB/ECCB.

[25]  E. Eichler,et al.  Regional patterns of gene expression in human and chimpanzee brains. , 2004, Genome research.

[26]  S. Horvath,et al.  Weighted gene coexpression network analysis strategies applied to mouse weight , 2007, Mammalian Genome.

[27]  S. Horvath,et al.  Evidence for anti-Burkitt tumour globulins in Burkitt tumour patients and healthy individuals. , 1967, British Journal of Cancer.

[28]  S. Horvath,et al.  Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target , 2006, Proceedings of the National Academy of Sciences.

[29]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[30]  Zheng Huang,et al.  Identification of the Proliferation/Differentiation Switch in the Cellular Network of Multicellular Organisms , 2006, PLoS Comput. Biol..

[31]  Steve Horvath,et al.  A Systems Genetics Approach Implicates USF1, FADS3, and Other Causal Candidate Genes for Familial Combined Hyperlipidemia , 2009, PLoS genetics.

[32]  Grace S. Shieh,et al.  Inferring transcriptional compensation interactions in yeast via stepwise structure equation modeling , 2008, BMC Bioinformatics.

[33]  S. Saitta,et al.  MAP'ing CNS Development and Cognition: An ERKsome Process , 2009, Neuron.

[34]  Eric E Schadt,et al.  Elucidating the role of gonadal hormones in sexually dimorphic gene coexpression networks. , 2009, Endocrinology.

[35]  E. Almaas Biological impacts and context of network theory , 2007, Journal of Experimental Biology.

[36]  S. Horvath,et al.  Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways , 2010, Proceedings of the National Academy of Sciences.

[37]  Peter Langfelder,et al.  Eigengene networks for studying the relationships between co-expression modules , 2007, BMC Systems Biology.

[38]  Eric E Schadt,et al.  Cycle Regulation in Islets with Diabetes Susceptibility a Gene Expression Network Model of Type 2 Diabetes Links Cell P

, 2008 .

[39]  Peter Langfelder,et al.  Is human blood a good surrogate for brain tissue in transcriptional studies? , 2010, BMC Genomics.

[40]  S. Horvath,et al.  Conservation and evolution of gene coexpression networks in human and chimpanzee brains , 2006, Proceedings of the National Academy of Sciences.

[41]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[42]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[43]  Waldo Cerpa,et al.  The role of Wnt signaling in neuroprotection. , 2009, Drug news & perspectives.

[44]  Richard M. Simon,et al.  Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data , 2002, Bioinform..

[45]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[46]  Michael Q. Zhang,et al.  Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data , 2002 .

[47]  Shi-Hua Zhang,et al.  Disease-Aging Network Reveals Significant Roles of Aging Genes in Connecting Genetic Diseases , 2009, PLoS Comput. Biol..

[48]  E. Koonin,et al.  Conservation and coevolution in the scale-free human gene coexpression network. , 2004, Molecular biology and evolution.

[49]  Grace S. Shieh,et al.  A pattern recognition approach to infer time-lagged genetic interactions , 2008, Bioinform..

[50]  S. Horvath,et al.  Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks , 2006, BMC Genomics.

[51]  S. Horvath,et al.  Functional organization of the transcriptome in human brain , 2008, Nature Neuroscience.

[52]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[53]  Gaurav Arora,et al.  Did natural selection for increased cognitive ability in humans lead to an elevated risk of cancer? , 2009, Medical hypotheses.

[54]  Aiqing He,et al.  Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids , 2006, Proceedings of the National Academy of Sciences.

[55]  Jun Dong,et al.  Understanding network concepts in modules , 2007, BMC Systems Biology.

[56]  E. Stone,et al.  Modulated Modularity Clustering as an Exploratory Tool for Functional Genomic Inference , 2009, PLoS genetics.

[57]  Antonio Reverter,et al.  A Differential Wiring Analysis of Expression Data Correctly Identifies the Gene Containing the Causal Mutation , 2009, PLoS Comput. Biol..