Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference

Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference.

[1]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[2]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[3]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[4]  Fabian J. Theis,et al.  Diffusion maps for high-dimensional single-cell analysis of differentiation data , 2015, Bioinform..

[5]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[6]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[7]  Michael C. Kelly,et al.  Single-cell RNA-Seq resolves cellular complexity in sensory organs from the neonatal inner ear , 2015, Nature Communications.

[8]  Sarah A. Teichmann,et al.  Single-cell transcriptomic reconstruction reveals cell cycle and multi-lineage differentiation defects in Bcl11a-deficient hematopoietic stem cells , 2015, Genome Biology.

[9]  Cole Trapnell,et al.  Defining cell types and states with single-cell genomics , 2015, Genome research.

[10]  S. Teichmann,et al.  Single-Cell RNA-Sequencing Reveals a Continuous Spectrum of Differentiation in Hematopoietic Cells , 2016, Cell reports.

[11]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[12]  E. Marco,et al.  Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape , 2014, Proceedings of the National Academy of Sciences.

[13]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[14]  E. Shapiro,et al.  Single-cell sequencing-based technologies will revolutionize whole-organism science , 2013, Nature Reviews Genetics.

[15]  Alexander J. Smola,et al.  Heteroscedastic Gaussian process regression , 2005, ICML.

[16]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[17]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[18]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.

[19]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[20]  Julian Parkhill,et al.  Single-cell genomics , 2008, Nature Reviews Microbiology.

[21]  Quin F. Wills,et al.  Application of single-cell genomics in cancer: promise and challenges , 2015, Human molecular genetics.

[22]  T. Hastie,et al.  Principal Curves , 2007 .

[23]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[24]  Kieran R. Campbell,et al.  Bayesian Gaussian Process Latent Variable Models for pseudotime inference in single-cell RNA-seq data , 2015, bioRxiv.

[25]  I. Macaulay,et al.  Single Cell Genomics: Advances and Future Perspectives , 2014, PLoS genetics.

[26]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[27]  David W. Nauen,et al.  Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis. , 2015, Cell stem cell.

[28]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[29]  Junhyong Kim,et al.  Reconstructing the Temporal Ordering of Biological Samples Using Microarray Data , 2003, Bioinform..

[30]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[31]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[32]  Peng Qiu,et al.  Discovering Biological Progression Underlying Microarray Samples , 2011, PLoS Comput. Biol..

[33]  Lorenz Wernisch,et al.  Pseudotime estimation: deconfounding single cell time series , 2015, bioRxiv.

[34]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[35]  Lior Pachter,et al.  Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis , 2015, Science.

[36]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[37]  A. Gupta,et al.  Extracting Dynamics from Static Cancer Expression Data , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[39]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[40]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[41]  Hisanori Kiryu,et al.  SCOUP: a probabilistic model based on the Ornstein–Uhlenbeck process to analyze single-cell expression data during differentiation , 2016, BMC Bioinformatics.

[42]  A. Gelman,et al.  Stan , 2015 .