Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data

MOTIVATION Inferring a gene regulatory network exclusively from microarray expression profiles is a difficult but important task. The aim of this work is to compare the predictive power of some of the most popular algorithms in different conditions (like data taken at equilibrium or time courses) and on both synthetic and real microarray data. We are in particular interested in comparing similarity measures both of linear type (like correlations and partial correlations) and of non-linear type (mutual information and conditional mutual information), and in investigating the underdetermined case (less samples than genes). RESULTS In our simulations we see that all network inference algorithms obtain better performances from data produced with 'structural' perturbations, like gene knockouts at steady state, than with any dynamical perturbation. The predictive power of all algorithms is confirmed on a reverse engineering problem from Escherichia coli gene profiling data: the edges of the 'physical' network of transcription factor-binding sites are significantly overrepresented among the highest weighting edges of the graph that we infer directly from the data without any structure supervision. Comparing synthetic and in vivo data on the same network graph allows us to give an indication of how much more complex a real transcriptional regulation program is with respect to an artificial model. AVAILABILITY Software is freely available at the URL http://people.sissa.it/~altafini/papers/SoBiAl07/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Paul M. Magwene,et al.  Estimating genomic coexpression networks using first-order conditional independence , 2004, Genome Biology.

[2]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  H Kishino,et al.  Correspondence analysis of genes and tissue types and finding genetic links from microarray data. , 2000, Genome informatics. Workshop on Genome Informatics.

[5]  Timothy S Gardner,et al.  Reverse-engineering transcription control networks. , 2005, Physics of life reviews.

[6]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[7]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[8]  Paul Erdös,et al.  On random graphs, I , 1959 .

[9]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[10]  P. D’haeseleer,et al.  Mining the gene expression matrix: inferring gene relationships from large scale gene expression data , 1998 .

[11]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Edwards Introduction to graphical modelling , 1995 .

[13]  Atul J. Butte,et al.  Unsupervised knowledge discovery in medical databases using relevance networks , 1999, AMIA.

[14]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[15]  Francis J. Doyle,et al.  Simulation Studies for the Identification of Genetic Networks from cDNA Array and Regulatory Activity Data , 2001 .

[16]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[17]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[18]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[19]  V. Anne Smith,et al.  Evaluating functional network inference using simulations of complex biological systems , 2002, ISMB.

[20]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[21]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[22]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[23]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[24]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[25]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.