Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers

Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

[1]  Li Qian,et al.  SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data , 2016, Genome Biology.

[2]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[3]  Kieran R. Campbell,et al.  switchde: inference of switch-like differential expression along single-cell trajectories , 2016, Bioinform..

[4]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[5]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[6]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[7]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[8]  Neil D. Lawrence,et al.  Topslam: Waddington Landscape Recovery for Single Cell Experiments , 2016, bioRxiv.

[9]  Kieran R. Campbell,et al.  Bayesian Gaussian Process Latent Variable Models for pseudotime inference in single-cell RNA-seq data , 2015, bioRxiv.

[10]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[11]  Lorenz Wernisch,et al.  Pseudotime estimation: deconfounding single cell time series , 2015, bioRxiv.

[12]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[13]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[14]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[15]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[16]  Kieran R. Campbell,et al.  Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference , 2016, bioRxiv.

[17]  Sean C. Bendall,et al.  Wishbone identifies bifurcating developmental trajectories from single-cell data , 2016, Nature Biotechnology.

[18]  F. Ginhoux,et al.  Mpath maps multi-branching single-cell trajectories revealing progenitor cell progression during development , 2016, Nature Communications.