Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks

The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.

[1]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[2]  Robert D. Leclerc Survival of the sparsest: robust gene networks are parsimonious , 2008, Molecular systems biology.

[3]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[4]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[6]  Ga Miller,et al.  Note on the bias of information estimates , 1955 .

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[9]  S. Stouffer Adjustment during army life , 1977 .

[10]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  F Emmert-Streib,et al.  Local network-based measures to assess the inferability of different regulatory networks. , 2010, IET systems biology.

[12]  Edda Klipp,et al.  Systems Biology , 1994 .

[13]  Tian Zheng,et al.  Inference of Regulatory Gene Interactions from Expression Data Using Three‐Way Mutual Information , 2009, Annals of the New York Academy of Sciences.

[14]  Andrea Califano,et al.  Lessons from the DREAM 2 Challenges A Community Effort to Assess Biological Network Inference , 2009 .

[15]  Mark J. van der Laan,et al.  A causal inference approach for constructing transcriptional regulatory networks , 2005, Bioinform..

[16]  Gianluca Bontempi,et al.  On the Impact of Entropy Estimation on Transcriptional Regulatory Network Inference Based on Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[17]  L. von Bertalanffy,et al.  The theory of open systems in physics and biology. , 1950, Science.

[18]  L. Bertalanffy AN OUTLINE OF GENERAL SYSTEM THEORY , 1950, The British Journal for the Philosophy of Science.

[19]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[20]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[21]  Martin Vingron,et al.  Normalization and quantification of differential expression in gene expression microarrays , 2006, Briefings Bioinform..

[22]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[23]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[24]  Allen Kent,et al.  Machine literature searching VIII. Operational criteria for designing information retrieval systems , 1955 .

[25]  Judea Pearl,et al.  Causal networks: semantics and expressiveness , 2013, UAI.

[26]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[27]  H. Quastler Information theory in psychology : problems and methods , 1955 .

[28]  Xiaodong Wang,et al.  Gene Regulatory Network Reconstruction Using Conditional Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[29]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[30]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[31]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[32]  M. Vidal A unifying view of 21st century systems biology , 2009, FEBS letters.

[33]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[34]  G. Altay,et al.  Structural influence of gene networks on their inference: analysis of C3NET. , 2011 .

[35]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[36]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[37]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[38]  Peter J. Woolf,et al.  Learning transcriptional regulatory networks from high throughput gene expression data using continuous three-way mutual information , 2008, BMC Bioinformatics.

[39]  Andrea Califano,et al.  Reverse engineering biological networks. Opportunities and challenges in computational methods for pathway inference. Proceedings of the workshop entitled Dialogue on Reverse Engineering Assessment and Methods (DREAM). September 7-8, 2006. Bronx, New York, USA. , 2007, Annals of the New York Academy of Sciences.

[40]  Xing Qiu,et al.  Utility of correlation measures in analysis of gene expression , 2011, NeuroRX.

[41]  William Bialek,et al.  Entropy and Inference, Revisited , 2001, NIPS.

[42]  E. Suchman,et al.  The American soldier: Adjustment during army life. (Studies in social psychology in World War II), Vol. 1 , 1949 .

[43]  Handbook of Parametric and Nonparametric Statistical Procedures , 2004 .

[44]  Frank Emmert-Streib,et al.  Inferring the conservative causal core of gene regulatory networks , 2010, BMC Systems Biology.

[45]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[46]  S. Dudoit,et al.  Multiple Testing Procedures with Applications to Genomics , 2007 .

[47]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[48]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[49]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[50]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[51]  Roland Eils,et al.  Inferring genetic regulatory logic from expression data , 2005, Bioinform..

[52]  Rainer Breitling,et al.  What is Systems Biology? , 2010, Front. Physiology.

[53]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[54]  Béla Bollobás,et al.  Random Graphs , 1985 .

[55]  G. Glazko,et al.  Network biology: a direct approach to study biological function , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[56]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[57]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[58]  Geoffrey I. Webb,et al.  Proportional k-Interval Discretization for Naive-Bayes Classifiers , 2001, ECML.