Reverse Engineering and Analysis of Genome-Wide Gene Regulatory Networks from Gene Expression Profiles Using High-Performance Computing

Regulation of gene expression is a carefully regulated phenomenon in the cell. "Reverse-engineering” algorithms try to reconstruct the regulatory interactions among genes from genome-scale measurements of gene expression profiles (microarrays). Mammalian cells express tens of thousands of genes; hence, hundreds of gene expression profiles are necessary in order to have acceptable statistical evidence of interactions between genes. As the number of profiles to be analyzed increases, so do computational costs and memory requirements. In this work, we designed and developed a parallel computing algorithm to reverse-engineer genome-scale gene regulatory networks from thousands of gene expression profiles. The algorithm is based on computing pairwise Mutual Information between each gene-pair. We successfully tested it to reverse engineer the Mus Musculus (mouse) gene regulatory network in liver from gene expression profiles collected from a public repository. A parallel hierarchical clustering algorithm was implemented to discover "communities” within the gene network. Network communities are enriched for genes involved in the same biological functions. The inferred network was used to identify two mitochondrial proteins.

[1]  Gennaro Oliva,et al.  A Parallel Implementation of the Network Identification by Multiple Regression (NIR) Algorithm to Reverse-Engineer Regulatory Gene Networks , 2010, PloS one.

[2]  Ariel S. Schwartz,et al.  An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man , 2010, Cell.

[3]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[4]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[7]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[8]  Zaher Dawy,et al.  An approximation to the distribution of finite sample size mutual information estimates , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[9]  Steen Knudsen Reverse Engineering of Regulatory Networks , 2005 .

[10]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[11]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[12]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[14]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[15]  D. di Bernardo,et al.  Transcriptional gene network inference from a massive dataset elucidates transcriptome organization and gene function , 2011, Nucleic acids research.

[16]  Feng Lin,et al.  A hierarchical clustering algorithm for MIMD architecture , 2004, Comput. Biol. Chem..

[17]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[18]  Marcus Hutter,et al.  Distribution of Mutual Information , 2001, NIPS.

[19]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[20]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.

[21]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[22]  Satoru Miyano,et al.  Inferring gene networks from time series microarray data using dynamic Bayesian networks , 2003, Briefings Bioinform..