Branch-recombinant Gaussian processes for analysis of perturbations in biological time series

Motivation A common class of behaviour encountered in the biological sciences involves branching and recombination. During branching, a statistical process bifurcates resulting in two or more potentially correlated processes that may undergo further branching; the contrary is true during recombination, where two or more statistical processes converge. A key objective is to identify the time of this bifurcation (branch or recombination time) from time series measurements, e.g. by comparing a control time series with perturbed time series. Gaussian processes (GPs) represent an ideal framework for such analysis, allowing for nonlinear regression that includes a rigorous treatment of uncertainty. Currently, however, GP models only exist for two‐branch systems. Here, we highlight how arbitrarily complex branching processes can be built using the correct composition of covariance functions within a GP framework, thus outlining a general framework for the treatment of branching and recombination in the form of branch‐recombinant Gaussian processes (B‐RGPs). Results We first benchmark the performance of B‐RGPs compared to a variety of existing regression approaches, and demonstrate robustness to model misspecification. B‐RGPs are then used to investigate the branching patterns of Arabidopsis thaliana gene expression following inoculation with the hemibotrophic bacteria, Pseudomonas syringae DC3000, and a disarmed mutant strain, hrpA. By grouping genes according to the number of branches, we could naturally separate out genes involved in basal immune response from those subverted by the virulent strain, and show enrichment for targets of pathogen protein effectors. Finally, we identify two early branching genes WRKY11 and WRKY17, and show that genes that branched at similar times to WRKY11/17 were enriched for W‐box binding motifs, and overrepresented for genes differentially expressed in WRKY11/17 knockouts, suggesting that branch time could be used for identifying direct and indirect binding targets of key transcription factors. Availability and implementation https://github.com/cap76/BranchingGPs Supplementary information Supplementary data are available at Bioinformatics online.

[1]  H. Poincaré Sur l'équilibre d'une masse fluide animée d'un mouvement de rotation , 1885, Bulletin astronomique.

[2]  J. Gurdon,et al.  The developmental capacity of nuclei taken from intestinal epithelium cells of feeding tadpoles. , 1962, Journal of embryology and experimental morphology.

[3]  Wen Huang,et al.  The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant , 2001, Nucleic Acids Res..

[4]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[5]  Jonathan D. G. Jones,et al.  The plant immune system , 2006, Nature.

[6]  I. Somssich,et al.  The Transcription Factors WRKY11 and WRKY17 Act as Negative Regulators of Basal Resistance in Arabidopsis thaliana[W][OA] , 2006, The Plant Cell Online.

[7]  S. Yamanaka,et al.  Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors , 2006, Cell.

[8]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[9]  D. Wanke,et al.  Studies on DNA-binding selectivity of WRKY transcription factors lend structural clues into WRKY-domain function , 2008, Plant Molecular Biology.

[10]  Harri Lähdesmäki,et al.  Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics , 2009, Bioinform..

[11]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[12]  T. Boller,et al.  Innate Immunity in Plants: An Arms Race Between Pattern Recognition Receptors in Plants and Effectors in Microbial Pathogens , 2009, Science.

[13]  D. Rowitch,et al.  CNS-resident glial progenitor/stem cells produce Schwann cells as well as oligodendrocytes during repair of CNS demyelination. , 2010, Cell stem cell.

[14]  Zoubin Ghahramani,et al.  A Robust Bayesian Two-Sample Test for Detecting Intervals of Differential Gene Expression in Microarray Time Series , 2009, RECOMB.

[15]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[16]  Neil D. Lawrence,et al.  A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression , 2011, BMC Bioinformatics.

[17]  Christopher A. Penfold,et al.  High-Resolution Temporal Profiling of Transcripts during Arabidopsis Leaf Senescence Reveals a Distinct Chronology of Processes and Regulation[C][W][OA] , 2011, Plant Cell.

[18]  M. S. Mukhtar,et al.  Independently Evolved Virulence Effectors Converge onto Hubs in a Plant Immune System Network , 2011, Science.

[19]  Christopher A. Penfold,et al.  How to infer gene networks from expression profiles, revisited , 2011, Interface Focus.

[20]  Christopher A. Penfold,et al.  Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks , 2012, Bioinform..

[21]  Neil D. Lawrence,et al.  Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters , 2013, BMC Bioinformatics.

[22]  Karsten M. Borgwardt,et al.  Arabidopsis Defense against Botrytis cinerea: Chronology and Regulation Deciphered by High-Resolution Temporal Transcriptomic Analysis[C][W][OA] , 2012, Plant Cell.

[23]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[24]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[25]  D. Friedmann-Morvinski,et al.  Dedifferentiation and reprogramming: origins of cancer stem cells , 2014, EMBO reports.

[26]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[27]  E. Marco,et al.  Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape , 2014, Proceedings of the National Academy of Sciences.

[28]  Christopher A. Penfold,et al.  Inferring orthologous gene regulatory networks using interspecies data fusion , 2015, Bioinform..

[29]  Christopher A. Penfold,et al.  CSI: a nonparametric Bayesian approach to network inference from multiple perturbed time series gene expression data , 2015, Statistical applications in genetics and molecular biology.

[30]  Christopher A. Penfold,et al.  Transcriptional Dynamics Driving MAMP-Triggered Immunity and Pathogen Effector-Mediated Immunosuppression in Arabidopsis Leaves Following Infection with Pseudomonas syringae pv tomato DC3000[OPEN] , 2015, Plant Cell.

[31]  Magnus Rattray,et al.  Inferring the perturbation time from biological time course data , 2016, Bioinform..

[32]  Lorenz Wernisch,et al.  Pseudotime estimation: deconfounding single cell time series , 2015, bioRxiv.

[33]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[34]  Sean C. Bendall,et al.  Wishbone identifies bifurcating developmental trajectories from single-cell data , 2016, Nature Biotechnology.

[35]  Christopher A. Penfold,et al.  Bayesian inference of transcriptional branching identifies regulators of early germ cell development in humans , 2017, bioRxiv.