Reverse Engineering Cellular Networks with Information

Building mathematical models of cellular networks lies at the core of systems biology. It involves, among other tasks, the reconstruction of the structure of interactions between molecular components, which is known as network inference or reverse engineering. Information theory can help in the goal of extracting as much information as possible from the available data. A large number of methods founded on these concepts have been proposed in the literature, not only in biology journals, but in a wide range of areas. Their critical comparison is difficult due to the different focuses and the adoption of different terminologies. Here we attempt to review some of the existing information theoretic methodologies for network inference, and clarify their differences. While some of these methods have achieved notable success, many challenges remain, among which we can mention dealing with incomplete measurements, noisy data, counterintuitive behaviour emerging from nonlinear relations or feedback loops, and computational burden of dealing with large data sets.

[1]  Adam P. Arkin,et al.  Statistical Construction of Chemical Reaction Mechanisms from Measured Time-Series , 1995 .

[2]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[3]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[4]  Michael Mitzenmacher,et al.  Equitability Analysis of the Maximal Information Coefficient, with Comparisons , 2013, ArXiv.

[5]  Adam Arkin,et al.  On the deduction of chemical reaction pathways from measurements of time series of concentrations. , 2001, Chaos.

[6]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[7]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  Roberto Marcondes Cesar Junior,et al.  Inference of gene regulatory networks from time series by Tsallis entropy , 2011, BMC Systems Biology.

[9]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  H. Marko,et al.  The Bidirectional Communication Theory - A Generalization of Information Theory , 1973, IEEE Transactions on Communications.

[12]  Frank Emmert-Streib,et al.  Influence of the experimental design of gene expression studies on the inference of gene regulatory networks: environmental factors , 2013, PeerJ.

[13]  Alfred O. Hero,et al.  Using Directed Information to Build Biologically Relevant Influence Networks , 2007, J. Bioinform. Comput. Biol..

[14]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[15]  J. Ross Determination of complex reaction mechanisms. Analysis of chemical, biological and genetic networks. , 2005, The journal of physical chemistry. A.

[16]  Michael Hecker,et al.  Gene regulatory network inference: Data integration in dynamic models - A review , 2009, Biosyst..

[17]  Frank Emmert-Streib,et al.  Bagging Statistical Network Inference from Large-Scale Gene Expression Data , 2012, PloS one.

[18]  J. Massey CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .

[19]  Constantino Tsallis,et al.  Asymptotically scale-invariant occupancy of phase space makes the entropy Sq extensive , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Amy K. Schmid,et al.  A Predictive Model for Transcriptional Control of Physiology in a Free Living Cell , 2007, Cell.

[21]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[22]  Hyung-Seok Choi,et al.  Reverse engineering of gene regulatory networks. , 2007, IET systems biology.

[23]  Kathleen Marchal,et al.  Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks , 2009, BMC Systems Biology.

[24]  Frank Emmert-Streib,et al.  Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks , 2011, PloS one.

[25]  Amitava Roy,et al.  Detection of long-range concerted motions in protein by a distance covariance. , 2012, Journal of chemical theory and computation.

[26]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[27]  Claudio Altafini,et al.  Comparing association network algorithms for reverse engineering of large-scale gene regulatory networks: synthetic versus real data , 2007, Bioinform..

[28]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[29]  P. Rapp,et al.  Statistical validation of mutual information calculations: comparison of alternative numerical algorithms. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Gustavo Stolovitzky,et al.  Reconstructing biological networks using conditional correlation analysis , 2005, Bioinform..

[31]  Edward R. Dougherty,et al.  Inferring gene regulatory networks from time series data using the minimum description length principle , 2006, Bioinform..

[32]  Adam A. Margolin,et al.  Multivariate dependence and genetic networks inference. , 2010, IET systems biology.

[33]  P. McSharry,et al.  Mathematical and computational techniques to deduce complex biochemical reaction mechanisms. , 2004, Progress in biophysics and molecular biology.

[34]  Christoph Adami,et al.  Information theory in molecular biology , 2004, q-bio/0405004.

[35]  Peter J. Woolf,et al.  Learning transcriptional regulatory networks from high throughput gene expression data using continuous three-way mutual information , 2008, BMC Bioinformatics.

[36]  E. H. Linfoot An Informational Measure of Correlation , 1957, Inf. Control..

[37]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[38]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[39]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[40]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[41]  George A. Bekey,et al.  Identification of Biological Systems : a Survey * , 2002 .

[42]  Albert-László Barabási,et al.  Scale-Free Networks: A Decade and Beyond , 2009, Science.

[43]  Michele Ceccarelli,et al.  articleTimeDelay-ARACNE : Reverse engineering of gene networks from time-course data by an information theoretic approach , 2010 .

[44]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[45]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[46]  Oliver Ebenhöh,et al.  Measuring correlations in metabolomic networks with mutual information. , 2008, Genome informatics. International Conference on Genome Informatics.

[47]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[48]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[49]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[50]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[51]  Benjamin E Dunmore,et al.  Gene network inference and visualization tools for biologists: application to new human transcriptome datasets , 2011, Nucleic acids research.

[52]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[53]  H. Marko Information theory and cybernetics , 1967 .

[54]  Jaakko Astola,et al.  Inference of Gene Regulatory Networks Based on a Universal Minimum Description Length , 2008, EURASIP J. Bioinform. Syst. Biol..

[55]  Daniel Marbach,et al.  Information-Theoretic Inference of Gene Networks Using Backward Elimination , 2010, BIOCOMP.

[56]  Judea Pearl,et al.  The International Journal of Biostatistics C AUSAL I NFERENCE An Introduction to Causal Inference , 2011 .

[57]  Mehmet Koyutürk Algorithmic and analytical methods in network biology , 2010, Wiley interdisciplinary reviews. Systems biology and medicine.

[58]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[59]  S. Stigler Francis Galton's Account of the Invention of Correlation , 1989 .

[60]  C. Tsallis Entropic nonextensivity: a possible measure of complexity , 2000, cond-mat/0010150.

[61]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[62]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.

[63]  Christoph Kaleta,et al.  Integrative inference of gene-regulatory networks in Escherichia coli using information theoretic concepts and sequence analysis , 2010, BMC Systems Biology.

[64]  P. Mathai,et al.  On the Detection of Gene Network Interconnections using Directed Mutual Information , 2007, 2007 Information Theory and Applications Workshop.

[65]  F. Galton Regression Towards Mediocrity in Hereditary Stature. , 1886 .

[66]  Chaoyang Zhang,et al.  A novel gene network inference algorithm using predictive minimum description length approach , 2010, BMC Systems Biology.

[67]  Paola Lecca,et al.  Inferring biochemical reaction pathways: the case of the gemcitabine pharmacokinetics , 2012, BMC Systems Biology.

[68]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[69]  Tian Zheng,et al.  Inference of Regulatory Gene Interactions from Expression Data Using Three‐Way Mutual Information , 2009, Annals of the New York Academy of Sciences.

[70]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[71]  D. di Bernardo,et al.  Transcriptional gene network inference from a massive dataset elucidates transcriptome organization and gene function , 2011, Nucleic acids research.

[72]  N. Bing,et al.  Genetical Genomics Analysis of a Yeast Segregant Population for Transcription Network Inference , 2005, Genetics.

[73]  J. Ross,et al.  MIDER: Network Inference with Mutual Information Distance and Entropy Reduction , 2014, PloS one.

[74]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[75]  Hiroaki Kitano,et al.  Foundations of systems biology , 2001 .

[76]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[77]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[78]  E. Dougherty,et al.  Inferring Connectivity of Genetic Regulatory Networks Using Information-Theoretic Criteria , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[79]  J. Ross,et al.  A Test Case of Correlation Metric Construction of a Reaction Pathway from Measurements , 1997 .

[80]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[81]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[82]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[83]  L. López-Kleine,et al.  Biostatistical approaches for the reconstruction of gene co-expression networks based on transcriptomic data. , 2013, Briefings in functional genomics.

[84]  M K Markey,et al.  Application of the mutual information criterion for feature selection in computer-aided diagnosis. , 2001, Medical physics.

[85]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[86]  Frank Emmert-Streib,et al.  Inferring the conservative causal core of gene regulatory networks , 2010, BMC Systems Biology.

[87]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[88]  Steen Knudsen Reverse Engineering of Regulatory Networks , 2005 .

[89]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[90]  N. Rashevsky Information theory in biology , 1954 .

[91]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[92]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[93]  Sapna Kumari,et al.  Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery , 2012, PloS one.

[94]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[95]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[96]  W Wiechert,et al.  Unravelling the regulatory structure of biochemical networks using stimulus response experiments and large-scale model selection. , 2006, Systems biology.

[97]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[98]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[99]  Eric Walter,et al.  Identification of Parametric Models: from Experimental Data , 1997 .

[100]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[101]  Jing Kong,et al.  Using distance correlation and SS-ANOVA to assess associations of familial relationships, lifestyle factors, diseases, and mortality , 2012, Proceedings of the National Academy of Sciences.

[102]  Todd P. Coleman,et al.  Estimating the directed information to infer causal relationships in ensemble neural spike train recordings , 2010, Journal of Computational Neuroscience.

[103]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[104]  Gianluca Bontempi,et al.  On the Impact of Entropy Estimation on Transcriptional Regulatory Network Inference Based on Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[105]  Z. Jiang,et al.  Reconstruction of transcriptional network from microarray data using combined mutual information and network-assisted regression. , 2011, IET systems biology.

[106]  Julio Saez-Rodriguez,et al.  Crowdsourcing Network Inference: The DREAM Predictive Signaling Network Challenge , 2011, Science Signaling.

[107]  Age K. Smilde,et al.  Metabolic network discovery through reverse engineering of metabolome data , 2009, Metabolomics.

[108]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[109]  Adam P. Arkin,et al.  Network News: Innovations in 21st Century Systems Biology , 2011, Cell.