Network inference and hypotheses-generation from single-cell transcriptomic data using multivariate information measures

Developmental processes are carefully orchestrated. A multi-cellular organism can only emerge from a single fertilised egg cell because gene expression is robustly regulated in space and time by networks of transcriptional regulators. Single cell transcriptomic data allow us to probe and map these networks in unprecedented detail. Here we develop an information theoretical framework to infer candidate gene (co-)regulatory networks and distill mechanistic hypotheses from single cell data. Information theory offers clear advantages for such data, where cell-to-cell variability is all pervasive and sample sizes are large. Higher-order information theoretical functionals capture interactions and dependencies between genes reliably in both in silico and real data.

[1]  Fabian J. Theis,et al.  Combined Single-Cell Functional and Gene Expression Analysis Resolves Heterogeneity within Stem Cell Populations , 2015, Cell stem cell.

[2]  Richard Bonneau,et al.  DREAM3: Network Inference Using Dynamic Context Likelihood of Relatedness and the Inferelator , 2010, PloS one.

[3]  Michael P. H. Stumpf,et al.  Graph spectral analysis of protein interaction network evolution , 2012, Journal of The Royal Society Interface.

[4]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[5]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[6]  Erik L. L. Sonnhammer,et al.  Functional association networks as priors for gene regulatory network inference , 2014, Bioinform..

[7]  H. Nakauchi,et al.  Clonal Analysis Unveils Self-Renewing Lineage-Restricted Progenitors Generated Directly from Hematopoietic Stem Cells , 2013, Cell.

[8]  Zhiyi Zhang,et al.  A mutual information estimator with exponentially decaying bias , 2015, Statistical applications in genetics and molecular biology.

[9]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[10]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Terence P. Speed,et al.  Bayesian Inference of Signaling Network Topology in a Cancer Cell Line , 2012, Bioinform..

[12]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[13]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[14]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[15]  Isabelle S. Peter,et al.  Genomic Control Processes in Adult Body Part Formation , 2015 .

[16]  Zhang Zhiyi,et al.  A mutual information estimator with exponentially decaying bias. , 2015 .

[17]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[18]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[19]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[20]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..

[21]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[22]  Thomas Thorne,et al.  Graphical modelling of molecular networks underlying sporadic inclusion body myositis. , 2013, Molecular bioSystems.

[23]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[24]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[25]  Evan O. Paull,et al.  Inferring causal molecular networks: empirical assessment through a community-based effort , 2016, Nature Methods.

[26]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[27]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Erik Clark,et al.  Odd-paired controls frequency doubling in Drosophila segmentation by altering the pair-rule gene regulatory network , 2016 .

[29]  Adrian E. Raftery,et al.  Fast Bayesian inference for gene regulatory networks using ScanBMA , 2014, BMC Systems Biology.

[30]  Tian Zheng,et al.  Inference of Regulatory Gene Interactions from Expression Data Using Three‐Way Mutual Information , 2009, Annals of the New York Academy of Sciences.

[31]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[32]  Sean C. Bendall,et al.  Wishbone identifies bifurcating developmental trajectories from single-cell data , 2016, Nature Biotechnology.

[33]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[34]  Paul D. W. Kirk,et al.  Model Selection in Systems Biology Depends on Experimental Design , 2014, PLoS Comput. Biol..

[35]  Lorenz Wernisch,et al.  Pseudotime estimation: deconfounding single cell time series , 2015, bioRxiv.

[36]  Matt Thomson,et al.  Pluripotency Factors in Embryonic Stem Cells Regulate Differentiation into Germ Layers , 2011, Cell.

[37]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[38]  M R DeWeese,et al.  How to measure the information gained from one symbol. , 1999, Network.

[39]  Heather A. Harrington,et al.  Nuclear to cytoplasmic shuttling of ERK promotes differentiation of muscle stem/progenitor cells , 2014, Development.

[40]  Frederick Mosteller,et al.  Data Analysis and Regression , 1978 .

[41]  Michael P. H. Stumpf,et al.  Statistical inference of the time-varying structure of gene-regulation networks , 2010, BMC Systems Biology.

[42]  Sach Mukherjee,et al.  Network inference using informative priors , 2008, Proceedings of the National Academy of Sciences.

[43]  Michael P. H. Stumpf,et al.  Generating confidence intervals on biological networks , 2007, BMC Bioinformatics.

[44]  J. Ross,et al.  MIDER: Network Inference with Mutual Information Distance and Entropy Reduction , 2014, PloS one.

[45]  A. Wagner,et al.  Automatic Generation of Predictive Dynamic Models Reveals Nuclear Phosphorylation as the Key Msn2 Control Mechanism , 2013, Science Signaling.

[46]  Mikael Huss,et al.  Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. , 2010, Developmental cell.

[47]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[48]  Emma Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[49]  Antonino Abbruzzo,et al.  Model selection for factorial Gaussian graphical models with an application to dynamic regulatory networks , 2016, Statistical applications in genetics and molecular biology.

[50]  Richard Bonneau,et al.  Biophysically motivated regulatory network inference: progress and prospects , 2016 .

[51]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[52]  Julio R. Banga,et al.  Reverse engineering and identification in systems biology: strategies, perspectives and challenges , 2014, Journal of The Royal Society Interface.

[53]  Michael P. H. Stumpf,et al.  Maximizing the Information Content of Experiments in Systems Biology , 2013, PLoS Comput. Biol..

[54]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[55]  C J Oates,et al.  Network Inference and Biological Dynamics. , 2011, The annals of applied statistics.

[56]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[57]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[58]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[59]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[60]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[61]  Christopher A. Penfold,et al.  CSI: a nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data , 2015, Statistical applications in genetics and molecular biology.

[62]  Shinya Kuroda,et al.  Robustness and Compensation of Information Transmission of Signaling Pathways , 2013, Science.

[63]  Alan Agresti,et al.  Bayesian inference for categorical data analysis , 2005, Stat. Methods Appl..

[64]  B. Göttgens Regulatory network control of blood stem cells. , 2015, Blood.

[65]  Jens Lichtenberg,et al.  Single-cell profiling of human megakaryocyte-erythroid progenitors identifies distinct megakaryocyte and erythroid differentiation pathways , 2016, Genome Biology.

[66]  A. M. Arias,et al.  Transition states and cell fate decisions in epigenetic landscapes , 2016, Nature Reviews Genetics.

[67]  J. Chiang,et al.  STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS , 2012, 1207.5578.

[68]  Simon E. F. Spencer,et al.  Quantifying the multi-scale performance of network inference algorithms , 2014, Statistical applications in genetics and molecular biology.

[69]  Julio R. Banga,et al.  Reverse Engineering Cellular Networks with Information Theoretic Methods , 2013, Cells.

[70]  Peter A. J. Hilbers,et al.  A Bayesian approach to targeted experiment design , 2012, Bioinform..

[71]  J. Scargle Studies in astronomical time series analysis. III - Fourier transforms, autocorrelation functions, and cross-correlation functions of unevenly spaced data , 1989 .

[72]  A. Oudenaarden,et al.  Design and Analysis of Single-Cell Sequencing Experiments , 2015, Cell.

[73]  Carsten Peterson,et al.  Single-Cell Network Analysis Identifies DDIT3 as a Nodal Lineage Regulator in Hematopoiesis , 2015, Cell reports.

[74]  ICHAEL,et al.  Information Processing by Simple Molecular Motifs and Susceptibility to Noise , 2015, bioRxiv.

[75]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[76]  Michele Ceccarelli,et al.  articleTimeDelay-ARACNE : Reverse engineering of gene networks from time-course data by an information theoretic approach , 2010 .

[77]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[78]  A. Martinez Arias,et al.  Cell dynamics and gene expression control in tissue homeostasis and development , 2015, Molecular systems biology.

[79]  Michael P H Stumpf,et al.  Information Processing by Simple Molecular Motifs and Susceptibility to Noise , 2015, bioRxiv.

[80]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[81]  Benjamin Flecker,et al.  Synergy, redundancy, and multivariate information measures: an experimentalist’s perspective , 2014, Journal of Computational Neuroscience.

[82]  Cole Trapnell,et al.  Single-cell transcriptome sequencing: recent advances and remaining challenges , 2016, F1000Research.

[83]  Zhike Zi,et al.  Inferring cellular regulatory networks with Bayesian model averaging for linear regression (BMALR). , 2014, Molecular bioSystems.

[84]  Alexander G. Gray,et al.  Introduction to astroML: Machine learning for astrophysics , 2012, 2012 Conference on Intelligent Data Understanding.

[85]  J. Briscoe,et al.  The route to spinal cord cell types: a tale of signals and switches. , 2015, Trends in genetics : TIG.

[86]  Michael P. H. Stumpf,et al.  Inference of temporally varying Bayesian Networks , 2012, Bioinform..

[87]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[88]  Fabian J. Theis,et al.  Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data , 2015, Bioinform..

[89]  John W. Tukey,et al.  Data Analysis and Regression: A Second Course in Statistics , 1977 .

[90]  Rudiyanto Gunawan,et al.  Assessment of Network Inference Methods: How to Cope with an Underdetermined Problem , 2014, PloS one.

[91]  Frank Emmert-Streib,et al.  Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks , 2011, PloS one.

[92]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[93]  Xiaodong Wang,et al.  Gene Regulatory Network Reconstruction Using Conditional Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[94]  Sarah Filippi,et al.  Information theory and signal transduction systems: from molecular information processing to network inference. , 2014, Seminars in cell & developmental biology.

[95]  Randall D. Beer,et al.  Nonnegative Decomposition of Multivariate Information , 2010, ArXiv.

[96]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[97]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[98]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[99]  Christopher A. Penfold,et al.  How to infer gene networks from expression profiles, revisited , 2011, Interface Focus.