MIDER: Network Inference with Mutual Information Distance and Entropy Reduction

The prediction of links among variables from a given dataset is a task referred to as network inference or reverse engineering. It is an open problem in bioinformatics and systems biology, as well as in other areas of science. Information theory, which uses concepts such as mutual information, provides a rigorous framework for addressing it. While a number of information-theoretic methods are already available, most of them focus on a particular type of problem, introducing assumptions that limit their generality. Furthermore, many of these methods lack a publicly available implementation. Here we present MIDER, a method for inferring network structures with information theoretic concepts. It consists of two steps: first, it provides a representation of the network in which the distance among nodes indicates their statistical closeness. Second, it refines the prediction of the existing links to distinguish between direct and indirect interactions and to assign directionality. The method accepts as input time-series data related to some quantitative features of the network nodes (such as e.g. concentrations, if the nodes are chemical species). It takes into account time delays between variables, and allows choosing among several definitions and normalizations of mutual information. It is general purpose: it may be applied to any type of network, cellular or otherwise. A Matlab implementation including source code and data is freely available (http://www.iim.csic.es/~gingproc/mider.html). The performance of MIDER has been evaluated on seven different benchmark problems that cover the main types of cellular networks, including metabolic, gene regulatory, and signaling. Comparisons with state of the art information–theoretic methods have demonstrated the competitive performance of MIDER, as well as its versatility. Its use does not demand any a priori knowledge from the user; the default settings and the adaptive nature of the method provide good results for a wide range of problems without requiring tuning.

[1]  E. H. Linfoot An Informational Measure of Correlation , 1957, Inf. Control..

[2]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[3]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.

[6]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Adam P. Arkin,et al.  Statistical Construction of Chemical Reaction Mechanisms from Measured Time-Series , 1995 .

[8]  Chi-Ying F. Huang,et al.  Ultrasensitivity in the mitogen-activated protein kinase cascade. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. Ross,et al.  A Test Case of Correlation Metric Construction of a Reaction Pathway from Measurements , 1997 .

[10]  C. Tsallis Generalized entropy-based criterion for consistent testing , 1998 .

[11]  C. Tsallis,et al.  Information gain within nonextensive thermostatistics , 1998 .

[12]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  Colin Studholme,et al.  An overlap invariant entropy measure of 3D medical image alignment , 1999, Pattern Recognit..

[15]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[17]  Adam Arkin,et al.  On the deduction of chemical reaction pathways from measurements of time series of concentrations. , 2001, Chaos.

[18]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[19]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[20]  P. McSharry,et al.  Mathematical and computational techniques to deduce complex biochemical reaction mechanisms. , 2004, Progress in biophysics and molecular biology.

[21]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  P. Rapp,et al.  Statistical validation of mutual information calculations: comparison of alternative numerical algorithms. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Constantino Tsallis,et al.  Asymptotically scale-invariant occupancy of phase space makes the entropy Sq extensive , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[25]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[26]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[27]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[28]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[29]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[30]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[31]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[32]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[33]  J. Ross Determination of complex reaction mechanisms. Analysis of chemical, biological and genetic networks. , 2005, The journal of physical chemistry. A.

[34]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[35]  Peter J. Woolf,et al.  Learning transcriptional regulatory networks from high throughput gene expression data using continuous three-way mutual information , 2008, BMC Bioinformatics.

[36]  D. Bernardo,et al.  A Yeast Synthetic Network for In Vivo Assessment of Reverse-Engineering and Modeling Approaches , 2009, Cell.

[37]  Michael Hecker,et al.  Gene regulatory network inference: Data integration in dynamic models - A review , 2009, Biosyst..

[38]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[39]  Michele Ceccarelli,et al.  articleTimeDelay-ARACNE : Reverse engineering of gene networks from time-course data by an information theoretic approach , 2010 .

[40]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[41]  Roberto Marcondes Cesar Junior,et al.  Inference of gene regulatory networks from time series by Tsallis entropy , 2011, BMC Systems Biology.

[42]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[43]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[44]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[45]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[46]  Julio Saez-Rodriguez,et al.  Crowdsourcing Network Inference: The DREAM Predictive Signaling Network Challenge , 2011, Science Signaling.

[47]  Paola Lecca,et al.  Inferring biochemical reaction pathways: the case of the gemcitabine pharmacokinetics , 2012, BMC Systems Biology.

[48]  Benjamin E Dunmore,et al.  Gene network inference and visualization tools for biologists: application to new human transcriptome datasets , 2011, Nucleic acids research.

[49]  Ziv Bar-Joseph,et al.  DREM 2.0: Improved reconstruction of dynamic regulatory networks from time-series expression data , 2012, BMC Systems Biology.

[50]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[51]  B. Frey,et al.  Network cleanup , 2013, Nature Biotechnology.

[52]  A. Barabasi,et al.  Network link prediction by global silencing of indirect correlations , 2013, Nature Biotechnology.

[53]  Paola Lecca,et al.  Biological network inference for drug discovery. , 2013, Drug discovery today.

[54]  Lucas Pelkmans,et al.  Predicting functional gene interactions with the hierarchical interaction score , 2013, Nature Methods.

[55]  Olivier J. J. Michel,et al.  The relation between Granger causality and directed information theory: a review , 2012, Entropy.

[56]  Julio R. Banga,et al.  Reverse Engineering Cellular Networks with Information Theoretic Methods , 2013, Cells.

[57]  Muriel Médard,et al.  Network deconvolution as a general method to distinguish direct dependencies in networks , 2013, Nature Biotechnology.

[58]  L. López-Kleine,et al.  Biostatistical approaches for the reconstruction of gene co-expression networks based on transcriptomic data. , 2013, Briefings in functional genomics.

[59]  Michael S. Samoilov,et al.  Inference of gene regulatory networks from genome-wide knockout fitness data , 2012, Bioinform..

[60]  Andrea Califano,et al.  hARACNe: improving the accuracy of regulatory model reverse engineering via higher-order data processing inequality tests , 2013, Interface Focus.

[61]  Mark A. Ragan,et al.  Supervised, semi-supervised and unsupervised inference of gene regulatory networks , 2013, Briefings Bioinform..

[62]  Julio R. Banga,et al.  Reverse engineering and identification in systems biology: strategies, perspectives and challenges , 2014, Journal of The Royal Society Interface.