Approximate inference of gene regulatory network models from RNA-Seq time series data

BackgroundInference of gene regulatory network structures from RNA-Seq data is challenging due to the nature of the data, as measurements take the form of counts of reads mapped to a given gene. Here we present a model for RNA-Seq time series data that applies a negative binomial distribution for the observations, and uses sparse regression with a horseshoe prior to learn a dynamic Bayesian network of interactions between genes. We use a variational inference scheme to learn approximate posterior distributions for the model parameters.ResultsThe methodology is benchmarked on synthetic data designed to replicate the distribution of real world RNA-Seq data. We compare our method to other sparse regression approaches and find improved performance in learning directed networks. We demonstrate an application of our method to a publicly available human neuronal stem cell differentiation RNA-Seq time series data set to infer the underlying network structure.ConclusionsOur method is able to improve performance on synthetic data by explicitly modelling the statistical distribution of the data when learning networks from RNA-Seq time series. Applying approximate inference techniques we can learn network structures quickly with only moderate computing resources.

[1]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[2]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[3]  Michael P. H. Stumpf,et al.  Inference of temporally varying Bayesian Networks , 2012, Bioinform..

[4]  M. Wand,et al.  Variational inference for count response semiparametric regression , 2013, 1309.4199.

[5]  Heiko Lemcke,et al.  Ca2+-mediated Mitochondrial Reactive Oxygen Species Metabolism Augments Wnt/β-Catenin Pathway Activation to Facilitate Cell Differentiation* , 2014, The Journal of Biological Chemistry.

[6]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[7]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[8]  Marco Grzegorczyk,et al.  Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes , 2011, Bioinform..

[9]  Kyuri Jo,et al.  Time-series RNA-seq analysis package (TRAP) and its application to the analysis of rice, Oryza sativa L. ssp. Japonica, upon drought stress. , 2014, Methods.

[10]  Thomas Thorne,et al.  Graphical modelling of molecular networks underlying sporadic inclusion body myositis. , 2013, Molecular bioSystems.

[11]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[12]  Sophie Lèbre,et al.  Statistical Applications in Genetics and Molecular Biology Inferring Dynamic Genetic Networks with Low Order Independencies Inferring Dynamic Genetic Networks with Low Order Independencies ∗ , 2009 .

[13]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[14]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[15]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[16]  David J. C. MacKay,et al.  Developments in Probabilistic Modelling with Neural Networks - Ensemble Learning , 1995, SNN Symposium on Neural Networks.

[17]  Dirk Husmeier,et al.  Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[18]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics* , 2013, Molecular & Cellular Proteomics.

[19]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[20]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[21]  Jeffrey T Leek,et al.  Reproducible RNA-seq analysis using recount2 , 2017, Nature Biotechnology.

[22]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[23]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[24]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[25]  J. Kolesar,et al.  MCUR1 is an essential component of mitochondrial Ca2+ uptake that regulates cellular metabolism , 2012, Nature Cell Biology.

[26]  Andrea Rau,et al.  A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data , 2013, PloS one.

[27]  Pradeep Ravikumar,et al.  A review of multivariate distributions for count data derived from the Poisson distribution , 2016, Wiley interdisciplinary reviews. Computational statistics.

[28]  Jong-Sun Kang,et al.  Cdo promotes neuronal differentiation via activation of the p38 mitogen‐activated protein kinase pathway , 2009, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[29]  Christopher A. Penfold,et al.  How to infer gene networks from expression profiles, revisited , 2011, Interface Focus.

[30]  Wei Chen,et al.  FastGGM: An Efficient Algorithm for the Inference of Gaussian Graphical Model in Biological Networks , 2016, PLoS Comput. Biol..

[31]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[32]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[33]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[34]  Wei Zhang,et al.  Cortical Thinning and Hydrocephalus in Mice Lacking the Immunoglobulin Superfamily Member CDO , 2006, Molecular and Cellular Biology.

[35]  김동일,et al.  LARS(Least Angle Regression)와 유전알고리즘을 결합한 변수 선택 알고리즘 , 2009 .

[36]  Linda C. van der Gaag,et al.  Probabilistic Graphical Models , 2014, Lecture Notes in Computer Science.

[37]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[38]  Tom Minka,et al.  Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[39]  Michael Morse,et al.  Multiple knockout mouse models reveal lincRNAs are required for life and brain development , 2013, eLife.

[40]  S. Aaronson,et al.  Cdo suppresses canonical Wnt signalling via interaction with Lrp6 thereby promoting neuronal differentiation , 2014, Nature Communications.

[41]  Christopher B. Burge,et al.  Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation , 2014, Bioinform..

[42]  Christopher A. Penfold,et al.  Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks , 2012, Bioinform..

[43]  Jens Keilwagen,et al.  PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R , 2015, Bioinform..

[44]  Michael P. H. Stumpf,et al.  Statistical inference of the time-varying structure of gene-regulation networks , 2010, BMC Systems Biology.

[45]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[46]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[47]  Shuangge Ma,et al.  Penalized count data regression with application to hospital stay after pediatric cardiac surgery , 2016, Statistical methods in medical research.

[48]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[49]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[50]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[51]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[52]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[53]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.