PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data

In the investigation of molecular mechanisms underlying cell state changes, a crucial analysis is to identify differentially expressed (DE) genes along a continuous cell trajectory, which can be estimated by pseudotime inference from single-cell RNA-sequencing (scRNA-seq) data. However, existing methods that identify DE genes based on inferred pseudotime do not account for the uncertainty in pseudotime inference. Also, they either have ill-posed p-values that hinder the control of false discovery rate (FDR) or have restrictive models that reduce the power of DE gene identification. To overcome these drawbacks, we propose PseudotimeDE, a robust method that accounts for the uncertainty in pseudotime inference and thus identifies DE genes along cell pseudotime with well-calibrated p-values. PseudotimeDE is flexible in allowing users to specify the pseudotime inference method and to choose the appropriate model for scRNA-seq data. Comprehensive simulations and real-data applications verify that PseudotimeDE provides well-calibrated p-values essential for controlling FDR and downstream analysis and that PseudotimeDE is more powerful than existing methods to identify DE genes.

[1]  Kieran R. Campbell,et al.  Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference , 2016, bioRxiv.

[2]  M. Delignette-Muller,et al.  fitdistrplus: An R Package for Fitting Distributions , 2015 .

[3]  S. Wood On p-values for smooth components of an extended generalized additive model , 2013 .

[4]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[5]  Russell V. Lenth,et al.  Response-Surface Methods in R, Using rsm , 2009 .

[6]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[7]  W. Maret,et al.  Expression of the ZIP/SLC39A transporters in β-cells: a systematic review and integration of multiple datasets , 2017, BMC Genomics.

[8]  Valentine Svensson,et al.  Droplet scRNA-seq is not zero-inflated , 2019, Nature Biotechnology.

[9]  David J. Anderson,et al.  Notch signalling controls pancreatic cell differentiation , 1999, Nature.

[10]  R. Satija,et al.  Single-cell RNA sequencing to explore immune cell heterogeneity , 2017, Nature Reviews Immunology.

[11]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2016, Cell.

[12]  Erin K O'Shea,et al.  Signal-dependent dynamics of transcription factor translocation controls gene expression , 2011, Nature Structural &Molecular Biology.

[13]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[14]  Fabian J. Theis,et al.  Concepts and limitations for learning developmental trajectories from single cell genomics , 2019, Development.

[15]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[16]  Hector Roux de Bézieux,et al.  Trajectory-based differential expression analysis for single-cell sequencing data , 2019, Nature Communications.

[17]  Fabian J Theis,et al.  Impulse model-based differential expression analysis of time course sequencing data , 2017, bioRxiv.

[18]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[19]  Daniel Spies,et al.  Comparative analysis of differential gene expression tools for RNA sequencing time course data , 2017, Briefings Bioinform..

[20]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[21]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[22]  M. Lenzen,et al.  Scientists’ warning on affluence , 2020, Nature Communications.

[23]  David R. Anderson,et al.  Understanding AIC and BIC in Model Selection , 2004 .

[24]  Lorenzo Trippa,et al.  Robust lineage reconstruction from high-dimensional single-cell data , 2016, bioRxiv.

[25]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, BMC Genomics.

[26]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[27]  Lorenz Wernisch,et al.  GPseudoRank: a permutation sampler for single cell orderings , 2018, Bioinform..

[28]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[30]  Michael Gruenstaeudl,et al.  PACVr: plastome assembly coverage visualization in R , 2020, BMC Bioinformatics.

[31]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[32]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[33]  Andrew J. Hill,et al.  The single cell transcriptional landscape of mammalian organogenesis , 2019, Nature.

[34]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[35]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[36]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[37]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[38]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[39]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[40]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[41]  S. Teichmann,et al.  A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications , 2017, Genome Medicine.

[42]  R. Irizarry,et al.  Missing data and technical variability in single‐cell RNA‐sequencing experiments , 2018, Biostatistics.

[43]  Kerstin B. Meyer,et al.  Single-cell reconstruction of the early maternal–fetal interface in humans , 2018, Nature.

[44]  Yvan Saeys,et al.  A cell atlas of human thymic development defines T cell repertoire formation , 2020, Science.

[45]  Levi Garraway,et al.  Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden , 2017, Genome Medicine.

[46]  S. Raychaudhuri,et al.  Distinct fibroblast subsets drive inflammation and damage in arthritis , 2019, Nature.

[47]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[48]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[49]  Sylvia Richardson,et al.  PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes. , 2013, Journal of statistical software.

[50]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[51]  Xu Ren,et al.  Negative binomial additive model for RNA-Seq data analysis , 2019, bioRxiv.

[52]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[53]  Sayan Mukherjee,et al.  Naught all zeros in sequence count data are the same , 2018, bioRxiv.

[54]  Krishna R. Kalari,et al.  Beta-Poisson model for single-cell RNA-seq data analyses , 2016, Bioinform..

[55]  Liu Yang,et al.  Deciphering Pancreatic Islet β Cell and α Cell Maturation Pathways and Characteristic Features at the Single-Cell Level. , 2017, Cell metabolism.

[56]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[57]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[58]  Jingyi Jessica Li,et al.  Bipartite Tight Spectral Clustering (BiTSC) Algorithm for Identifying Conserved Gene Co-clusters in Two Species , 2019, bioRxiv.