Reverse Engineering Gene Networks: A Comparative Study at Genome-scale

Motivation: Reverse engineering gene networks from expression data is a widelymstudied problem, for which numerous mathematical models have been developed. Network reconstruction methods can be used to study specific pathways, or can be applied at the whole-genome scale to analyze large compendiums of expression datasets to uncover genome-wide interactions. However, few methods can scale to such large number of genes and experiments, and to date, genome-scale comparative assessment of network reconstruction methods has largely been limited to simpler organisms such as E. coli. Results: In this paper, we analyze 11,760 microarray experiments on the model plant Arabidopsis thaliana drawn from public repositories. We generate genome scale networks of Arabidopsis using three different methods -- Pearson correlation, mutual information, and graphical Gaussian modeling -- and analyze and compare these networks to test for their robustness in successfully recovering relationships between functionally related genes. We demonstrate that functional grouping of microarray experiments into different tissue types and experimental conditions is important to discover context-specific interactions. Our comparisons include benchmarking against experimentally confirmed interactions, the Arabidopsis network resource AraNet, and study of specific pathways. Our results show that networks generated by the mutual information based method have better characteristics in terms of functional modularity as measured by both connected component and sub-network extraction analysis with respect to gene sets selected from brassinosteroid and stress regulation pathways. Availability: The classification datasets and constructed genome-scale networks are publicly available at the location http://alurulab.cc.gatech.edu/arabidopsis-networks

[1]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[2]  Wei-Po Lee,et al.  Computational methods for discovering gene networks from expression data , 2009, Briefings Bioinform..

[3]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[4]  Min Chen,et al.  Comparing Statistical Methods for Constructing Large Scale Gene Networks , 2012, PloS one.

[5]  Srinivas Aluru,et al.  Parallel Bayesian network structure learning with application to gene networks , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Kun He,et al.  An Arabidopsis Transcriptional Regulatory Map Reveals Distinct Functional and Evolutionary Features of Novel Transcription Factors , 2015, Molecular biology and evolution.

[7]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[8]  R. Spielman,et al.  expression reveals gene interactions and functions Coexpression network based on natural variation in human gene Material , 2009 .

[9]  E. Marcotte,et al.  Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana , 2010, Nature Biotechnology.

[10]  V. Anne Smith,et al.  Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks , 2002 .

[11]  Srinivas Aluru,et al.  Parallel Information-Theory-Based Construction of Genome-Wide Gene Regulatory Networks , 2010, IEEE Transactions on Parallel and Distributed Systems.

[12]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  M. Aluru,et al.  Reverse engineering and analysis of large genome-scale gene networks , 2012, Nucleic acids research.

[14]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[15]  Graham J. Wills,et al.  Introduction to graphical modelling , 1995 .

[16]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[17]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[18]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[19]  Srinivas Aluru,et al.  Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel®Xeon Phi™ Coprocessor , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Claudio Altafini,et al.  Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data , 2007, Bioinform..

[21]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[22]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[23]  Yong-Mei Bi,et al.  A Developmental Transcriptional Network for Maize Defines Coexpression Modules1[C][W][OA] , 2013, Plant Physiology.

[24]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[25]  Trevor M. Nolan,et al.  RD26 mediates crosstalk between drought and brassinosteroid signalling pathways , 2017, Nature Communications.

[26]  Soon Il Kwon,et al.  Antagonistic Regulation of Arabidopsis Growth by Brassinosteroids and Abiotic Stresses , 2014, Molecules and cells.

[27]  Julie A. Dickerson,et al.  Arabidopsis gene co-expression network and its functional modules , 2009, BMC Bioinformatics.

[28]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[29]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[30]  Srinivas Aluru,et al.  Parallel Bayesian Network Structure Learning for Genome-Scale Gene Networks , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Hyojin Kim,et al.  AraNet v2: an improved database of co-functional gene networks for the study of Arabidopsis thaliana and 27 other nonmodel plant species , 2014, Nucleic Acids Res..

[32]  Qingqiu Gong,et al.  An Arabidopsis gene network based on the graphical Gaussian model. , 2007, Genome research.