Trajectory-based differential expression analysis for single-cell sequencing data

Trajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression. Downstream of trajectory inference, it is vital to discover genes that are (i) associated with the lineages in the trajectory, or (ii) differentially expressed between lineages, to illuminate the underlying biological processes. Current data analysis procedures, however, either fail to exploit the continuous resolution provided by trajectory inference, or fail to pinpoint the exact types of differential expression. We introduce tradeSeq, a powerful generalized additive model framework based on the negative binomial distribution that allows flexible inference of both within-lineage and between-lineage differential expression. By incorporating observation-level weights, the model additionally allows to account for zero inflation. We evaluate the method on simulated datasets and on real datasets from droplet-based and full-length protocols, and show that it yields biological insights through a clear interpretation of the data. Downstream of trajectory inference for cell lineages based on scRNA-seq data, differential expression analysis yields insight into biological processes. Here, Van den Berge et al. develop tradeSeq, a framework for the inference of within and between-lineage differential expression, based on negative binomial generalized additive models.

[1]  Charlotte Soneson,et al.  Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications , 2018, Genome Biology.

[2]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[3]  Ana Conesa,et al.  Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series , 2014, Bioinform..

[4]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[5]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[6]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[7]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[8]  D. Restrepo,et al.  Microvillous cells expressing IP3 receptor type 3 in the olfactory epithelium of mice , 2010, The European journal of neuroscience.

[9]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[10]  S. Wood,et al.  Smoothing Parameter and Model Selection for General Smooth Models , 2015, 1511.03864.

[11]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[12]  B. Di Camillo,et al.  FunPat: function-based pattern analysis on RNA-seq time series data , 2015, BMC Genomics.

[13]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, BMC Genomics.

[14]  Meena Subramaniam,et al.  Lineage dynamics of murine pancreatic development at single-cell resolution , 2018, Nature Communications.

[15]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[16]  Joseph T. Roland,et al.  Unsupervised Trajectory Analysis of Single-Cell RNA-Seq and Imaging Data Reveals Alternative Tuft Cell Origins in the Gut. , 2017, Cell systems.

[17]  Charlotte Soneson,et al.  iCOBRA: open, reproducible, standardized and live method benchmarking , 2015, Nature Methods.

[18]  S. Shen-Orr,et al.  Alignment of single-cell trajectories to compare cellular expression dynamics , 2018, Nature Methods.

[19]  Ivona Percec,et al.  Identification of a mesenchymal progenitor cell hierarchy in adipose tissue , 2019, Science.

[20]  Shuigeng Zhou,et al.  Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM , 2019, Nature Communications.

[21]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[22]  Virgilio Gómez-Rubio,et al.  Generalized Additive Models: An Introduction with R (2nd Edition) , 2018 .

[23]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[24]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[25]  Alexis Boukouvalas,et al.  BGP: identifying gene-specific branching dynamics from single-cell data with a branching Gaussian process , 2018, Genome Biology.

[26]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[27]  Nina M. Fischer,et al.  Influence of Na+ and Mg2+ ions on RNA structures studied with molecular dynamics simulations , 2018, Nucleic acids research.

[28]  Nicolas Le Novère,et al.  Perturbations of PIP3 signalling trigger a global remodelling of mRNA landscape and reveal a transcriptional feedback loop , 2015, Nucleic acids research.

[29]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[30]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[31]  Y. Saeys,et al.  Computational methods for trajectory inference from single‐cell transcriptomics , 2016, European journal of immunology.

[32]  Russell B. Fletcher,et al.  Deconstructing Olfactory Stem Cell Trajectories at Single-Cell Resolution. , 2017, Cell stem cell.

[33]  M. Robinson,et al.  stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage , 2017, Genome Biology.

[34]  Sandrine Dudoit,et al.  clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets , 2018, bioRxiv.

[35]  Gregory R. Grant,et al.  A flexible two-stage procedure for identifying gene sets that are differentially expressed , 2009, Bioinform..

[36]  Fabian J Theis,et al.  Impulse model-based differential expression analysis of time course sequencing data , 2017, bioRxiv.

[37]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[38]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[39]  Neil D. Lawrence,et al.  Overlapping Mixtures of Gaussian Processes for the Data Association Problem , 2011, Pattern Recognit..

[40]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[41]  Sandrine Dudoit,et al.  clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets , 2018 .

[42]  Ning Leng,et al.  Trendy: segmented regression analysis of expression dynamics in high-throughput ordered profiling experiments , 2018, BMC Bioinformatics.

[43]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.