Discrete distributional differential expression (D3E) - a tool for gene expression analysis of single-cell RNA-seq data

BackgroundThe advent of high throughput RNA-seq at the single-cell level has opened up new opportunities to elucidate the heterogeneity of gene expression. One of the most widespread applications of RNA-seq is to identify genes which are differentially expressed between two experimental conditions.ResultsWe present a discrete, distributional method for differential gene expression (D3E), a novel algorithm specifically designed for single-cell RNA-seq data. We use synthetic data to evaluate D3E, demonstrating that it can detect changes in expression, even when the mean level remains unchanged. Since D3E is based on an analytically tractable stochastic model, it provides additional biological insights by quantifying biologically meaningful properties, such as the average burst size and frequency. We use D3E to investigate experimental data, and with the help of the underlying model, we directly test hypotheses about the driving mechanism behind changes in gene expression.ConclusionEvaluation using synthetic data shows that D3E performs better than other methods for identifying differentially expressed genes since it is designed to take full advantage of the information available from single-cell RNA-seq experiments. Moreover, the analytical model underlying D3E makes it possible to gain additional biological insights.

[1]  Alexander van Oudenaarden,et al.  Variability in gene expression underlies incomplete penetrance , 2009, Nature.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Olivier Gandrillon,et al.  On the spontaneous stochastic dynamics of a single gene: complexity of the molecular interplay at the promoter , 2010, BMC Systems Biology.

[4]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[5]  A. Oudenaarden,et al.  Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences , 2008, Cell.

[6]  Alexei A. Sharov,et al.  Database for mRNA Half-Life of 19 977 Genes Obtained by DNA Microarray Analysis of Pluripotent and Differentiating Mouse Embryonic Stem Cells , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[7]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[8]  F. Biase,et al.  Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing , 2014, Genome research.

[9]  R. Singer,et al.  Transcriptional Pulsing of a Developmental Gene , 2006, Current Biology.

[10]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[11]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[12]  Martin Hemberg,et al.  Quantification of mRNA in single cells and modelling of RT-qPCR induced noise , 2008, BMC Molecular Biology.

[13]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[14]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.

[15]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[16]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[17]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[18]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[19]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[20]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[21]  Wolfgang Huber,et al.  Love MI, Huber W, Anders S.. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol 15: 550 , 2014 .

[22]  Iris Müller,et al.  Digital nature of the immediate-early transcriptional response , 2010, Development.

[23]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[24]  T. W. Anderson On the Distribution of the Two-Sample Cramer-von Mises Criterion , 1962 .

[25]  Francesco Falciani,et al.  DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research , 2007, Molecular medicine.

[26]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[27]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[28]  Quin F. Wills,et al.  Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments , 2013, Nature Biotechnology.

[29]  Liat Rosenfeld,et al.  Single-allele analysis of transcription kinetics in living mammalian cells , 2010, Nature Methods.

[30]  J. Marioni,et al.  Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data , 2013, Genome Biology.

[31]  S. Linnarsson,et al.  RNA-Seq Analysis Reveals Different Dynamics of Differentiation of Human Dermis- and Adipose-Derived Stromal Stem Cells , 2012, PloS one.

[32]  Samir K. Bhattacharya,et al.  On a discrete compound distribution , 1965 .

[33]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[34]  Jared E. Toettcher,et al.  Stochastic Gene Expression in a Lentiviral Positive-Feedback Loop: HIV-1 Tat Fluctuations Drive Phenotypic Diversity , 2005, Cell.

[35]  D. Gillespie A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions , 1976 .

[36]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[37]  A. Novick,et al.  ENZYME INDUCTION AS AN ALL-OR-NONE PHENOMENON. , 1957, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Sean M. Grimmond,et al.  SnapShot-Seq: A Method for Extracting Genome-Wide, In Vivo mRNA Dynamics from a Single Total RNA Sample , 2014, PloS one.

[39]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[40]  D. Tranchina,et al.  Stochastic mRNA Synthesis in Mammalian Cells , 2006, PLoS biology.

[41]  M. Elowitz,et al.  Functional Roles of Pulsing in Genetic Circuits , 2013, Science.

[42]  O. Berg A model for the statistical fluctuations of protein numbers in a microbial population. , 1978, Journal of theoretical biology.

[43]  Chun Hsu,et al.  Additional file 2: Figure S2. , 2012 .

[44]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[45]  J. Peccoud,et al.  Markovian Modeling of Gene-Product Synthesis , 1995 .

[46]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[47]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[48]  Ting Chen,et al.  Modeling RNA degradation for RNA-Seq with applications. , 2012, Biostatistics.