Information theoretic approaches for inference of biological networks from continuous-valued data

BackgroundCharacterising programs of gene regulation by studying individual protein-DNA and protein-protein interactions would require a large volume of high-resolution proteomics data, and such data are not yet available. Instead, many gene regulatory network (GRN) techniques have been developed, which leverage the wealth of transcriptomic data generated by recent consortia to study indirect, gene-level relationships between transcriptional regulators. Despite the popularity of such methods, previous methods of GRN inference exhibit limitations that we highlight and address through the lens of information theory.ResultsWe introduce new model-free and non-linear information theoretic measures for the inference of GRNs and other biological networks from continuous-valued data. Although previous tools have implemented mutual information as a means of inferring pairwise associations, they either introduce statistical bias through discretisation or are limited to modelling undirected relationships. Our approach overcomes both of these limitations, as demonstrated by a substantial improvement in empirical performance for a set of 160 GRNs of varying size and topology.ConclusionsThe information theoretic measures described in this study yield substantial improvements over previous approaches (e.g. ARACNE) and have been implemented in the latest release of NAIL (Network Analysis and Inference Library). However, despite the theoretical and empirical advantages of these new measures, they do not circumvent the fundamental limitation of indeterminacy exhibited across this class of biological networks. These methods have presently found value in computational neurobiology, and will likely gain traction for GRN analysis as the volume and quality of temporal transcriptomics data continues to improve.

[1]  Minoru Asada,et al.  Information processing in echo state networks at the edge of chaos , 2011, Theory in Biosciences.

[2]  Masaru Tomita,et al.  Indeterminacy of Reverse Engineering of Gene Regulatory Networks: The Curse of Gene Elasticity , 2007, PloS one.

[3]  Junhee Seok,et al.  Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning , 2015, Scientific Reports.

[4]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[5]  Mikhail Prokopenko,et al.  Transfer Entropy and Transient Limits of Computation , 2014, Scientific Reports.

[6]  Pedro Mendes,et al.  GEPASI: a software package for modelling the dynamics, steady states and control of biochemical and other systems , 1993, Comput. Appl. Biosci..

[7]  Edmund J. Crampin,et al.  Modelling the conditional regulatory activity of methylated and bivalent promoters , 2015, Epigenetics & Chromatin.

[8]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[9]  Melissa J. Davis,et al.  Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets , 2012, Genome Medicine.

[10]  E. Crampin,et al.  Reconstructing gene regulatory networks: from random to scale-free connectivity. , 2006, Systems biology.

[11]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[12]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[13]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[14]  Mark A. Ragan,et al.  Supervised, semi-supervised and unsupervised inference of gene regulatory networks , 2013, Briefings Bioinform..

[15]  Melissa J. Davis,et al.  Predicting expression: the complementary power of histone modification and transcription factor binding data , 2014, Epigenetics & Chromatin.

[16]  Hyunjin Park,et al.  Corrigendum: Sound Packing DNA: packing open circular DNA with low-intensity ultrasound , 2015, Scientific Reports.

[17]  Edmund J. Crampin,et al.  Integration of Steady-State and Temporal Gene Expression Data for the Inference of Gene Regulatory Networks , 2013, PloS one.

[18]  Andrea Masotti,et al.  Telomere shortening and telomere position effect in mild ring 17 syndrome , 2014, Epigenetics & Chromatin.

[19]  A. Hill,et al.  The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves , 1910 .

[20]  Viola Priesemann,et al.  TRENTOOL: A Matlab open source toolbox to analyse information flow in time series data with transfer entropy , 2011, BMC Neuroscience.

[21]  Xiao-Fan Wang,et al.  Signaling cross-talk between TGF-β/BMP and other pathways , 2009, Cell Research.

[22]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[23]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[24]  Jakob Heinzle,et al.  Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity , 2010, Journal of Computational Neuroscience.

[25]  N. Novère Quantitative and logic modelling of molecular and gene networks , 2015, Nature Reviews Genetics.

[26]  Paul Thompson,et al.  Phospholipids and insulin resistance in psychosis: a lipidomics study of twin pairs discordant for schizophrenia , 2012, Genome Medicine.

[27]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[28]  N. D. Clarke,et al.  Correction: Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PLoS ONE.

[29]  T. Bossomaier,et al.  Transfer entropy as a log-likelihood ratio. , 2012, Physical review letters.

[30]  M. Roulston Estimating the errors on measured entropy and mutual information , 1999 .

[31]  Gordon Pipa,et al.  Assessing coupling dynamics from an ensemble of time series , 2010, Entropy.

[32]  Edmund J. Crampin,et al.  NAIL, a software toolset for inferring, analyzing and visualizing regulatory networks , 2015, Bioinform..

[33]  Jacques Dixmier,et al.  Sur Les Structures Boréliennes Du Spectre D’une C*-Algèbre , 1960 .

[34]  Renato Renner,et al.  An intuitive proof of the data processing inequality , 2011, Quantum Inf. Comput..

[35]  Umberto Lucia,et al.  A thermophysical approach to the proton pump vacuolar-ATPase. , 2014 .

[36]  A. Seth,et al.  Granger causality and transfer entropy are equivalent for Gaussian variables. , 2009, Physical review letters.

[37]  D. Featherstone,et al.  Wrestling with pleiotropy: genomic and topological analysis of the yeast gene expression network. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[38]  S. Frenzel,et al.  Partial mutual information for coupling analysis of multivariate time series. , 2007, Physical review letters.

[39]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[40]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[41]  Tao Jiang,et al.  OligoSpawn: a software tool for the design of overgo probes from large unigene datasets , 2006, BMC Bioinformatics.

[42]  W. Reik,et al.  Selective impairment of methylation maintenance is the major cause of DNA methylation reprogramming in the early embryo , 2015, Epigenetics & Chromatin.

[43]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[44]  Benjamin E Dunmore,et al.  Gene network inference and visualization tools for biologists: application to new human transcriptome datasets , 2011, Nucleic acids research.

[45]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[46]  A. Califano,et al.  Dialogue on Reverse‐Engineering Assessment and Methods , 2007, Annals of the New York Academy of Sciences.

[47]  J. H. Hofmeyr,et al.  The reversible Hill equation: how to incorporate cooperative enzymes into metabolic models , 1997, Comput. Appl. Biosci..

[48]  Edmund J. Crampin,et al.  Predictive modelling of gene expression from transcriptional regulatory elements , 2015, Briefings Bioinform..

[49]  Brian C. Ross Mutual Information between Discrete and Continuous Data Sets , 2014, PloS one.

[50]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[51]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[52]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[53]  T. Schreiber,et al.  Information transfer in continuous processes , 2002 .

[54]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[55]  D. Pe’er,et al.  Principles and Strategies for Developing Network Models in Cancer , 2011, Cell.

[56]  Mitchell Jones,et al.  Cautionary Tales of Inapproximability , 2016, J. Comput. Biol..

[57]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[58]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[59]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[60]  João Ricardo Sato,et al.  Modeling gene expression regulatory networks with the sparse vector autoregressive model , 2007, BMC Systems Biology.

[61]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[62]  S. Ghosh,et al.  Crosstalk in NF-κB signaling pathways , 2011, Nature Immunology.

[63]  N LeNovère Quantitative and logic modelling of molecular and gene networks. , 2015 .

[64]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[65]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[66]  Pushpa N. Rathie,et al.  On the entropy of continuous probability distributions (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[67]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[68]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..