A statistical approach for identifying differential distributions in single-cell RNA-seq experiments

The ability to quantify cellular heterogeneity is a major advantage of single-cell technologies. However, statistical methods often treat cellular heterogeneity as a nuisance. We present a novel method to characterize differences in expression in the presence of distinct expression states within and among biological conditions. We demonstrate that this framework can detect differential expression patterns under a wide range of settings. Compared to existing approaches, this method has higher power to detect subtle differences in gene expression distributions that are more complex than a mean shift, and can characterize those differences. The freely available R package scDD implements the approach.

[1]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[2]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  S. MacEachern,et al.  A semiparametric Bayesian model for randomised block designs , 1996 .

[5]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[6]  N. Walworth Cell-cycle checkpoint kinases: checking in on the cell cycle. , 2000, Current opinion in cell biology.

[7]  M. Thattai,et al.  Intrinsic noise in gene regulatory networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[9]  Ertugrul M. Ozbudak,et al.  Regulation of noise in the expression of a single gene , 2002, Nature Genetics.

[10]  Takumi Miura,et al.  Monitoring early differentiation events in human embryonic stem cells by massively parallel signature sequencing and expressed sequence tag scan. , 2004, Stem cells and development.

[11]  T. Elston,et al.  Stochasticity in gene expression: from theories to phenotypes , 2005, Nature Reviews Genetics.

[12]  M. Barbacid,et al.  Mammalian cyclin-dependent kinases. , 2005, Trends in biochemical sciences.

[13]  D. Tranchina,et al.  Stochastic mRNA Synthesis in Mammalian Cells , 2006, PLoS biology.

[14]  S. Dalton,et al.  Cell cycle control of embryonic stem cells , 2007, Stem Cell Reviews.

[15]  Gary O Zerbe,et al.  Permutation‐based adjustments for the significance of partial regression coefficients in microarray data analysis , 2008, Genetic epidemiology.

[16]  Li-Fang Chu,et al.  Ronin Is Essential for Embryogenesis and the Pluripotency of Mouse Embryonic Stem Cells , 2008, Cell.

[17]  T. Tarpey,et al.  Model misspecification , 2008, Statistical modelling.

[18]  Kevin R. Coombes,et al.  The Bimodality Index: A Criterion for Discovering and Ranking Bimodal Signatures from Cancer Gene Expression Profiling Data , 2009, Cancer informatics.

[19]  C. Elsik The pea aphid genome sequence brings theories of insect defense into question , 2010, Genome Biology.

[20]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[21]  Timothy K Lee,et al.  Single-cell NF-κB dynamics reveal digital activation and analogue information processing , 2010, Nature.

[22]  Catalin C. Barbacioru,et al.  Tracing the Derivation of Embryonic Stem Cells from the Inner Cell Mass by Single-Cell RNA-Seq Analysis , 2010, Cell stem cell.

[23]  Jeffrey L. Wrana,et al.  An Alternative Splicing Switch Regulates Embryonic Stem Cell Pluripotency and Reprogramming , 2011, Cell.

[24]  Matthew S. Shotwell,et al.  Bayesian Outlier Detection with Dirichlet Process Mixtures , 2011 .

[25]  Jennifer M. Bolin,et al.  Chemically defined conditions for human iPS cell derivation and culture , 2011, Nature Methods.

[26]  Lianming Wang,et al.  Fast Bayesian Inference in Dirichlet Process Mixture Models , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[27]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[28]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[29]  Gyan Bhanot,et al.  Single Cell Profiling of Circulating Tumor Cells: Transcriptional Heterogeneity and Diversity from Breast Cancer Cell Lines , 2012, PloS one.

[30]  Boris N. Kholodenko,et al.  Emergence of bimodal cell population responses from the interplay between analog single-cell signaling and protein expression noise , 2012, BMC Systems Biology.

[31]  T. Hashimshony,et al.  CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. , 2012, Cell reports.

[32]  J. Marioni,et al.  Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data , 2013, Genome Biology.

[33]  M. Lako,et al.  A Putative Role for the Immunoproteasome in the Maintenance of Pluripotency in Human Embryonic Stem Cells , 2012, Stem cells.

[34]  Boris N. Kholodenko,et al.  Bimodal Protein Distributions in Heterogeneous Oscillating Systems , 2012, CMSB.

[35]  Adrian E. Raftery,et al.  mclust Version 4 for R : Normal Mixture Modeling for Model-Based Clustering , Classification , and Density Estimation , 2012 .

[36]  Yoo Jin Jung,et al.  The transcriptional landscape and mutational profile of lung adenocarcinoma , 2012, Genome research.

[37]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[38]  S. Horvath,et al.  Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing , 2013, Nature.

[39]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[40]  Stefan Van Aelst,et al.  Fast and robust bootstrap for multivariate inference: The R package FRB , 2013 .

[41]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[42]  Shintaro Katayama,et al.  SAMstrt: statistical test for differential expression in single-cell transcriptome with spike-in normalization , 2013, Bioinform..

[43]  Matthew S. Shotwell,et al.  profdpm: An R Package for MAP Estimation in a Class of Conjugate Product Partition Models , 2013 .

[44]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.

[45]  Momiao Xiong,et al.  Canonical correlation analysis for RNA-seq co-expression networks , 2013, Nucleic acids research.

[46]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[47]  Michael Q. Zhang,et al.  Epigenomic Analysis of Multilineage Differentiation of Human Embryonic Stem Cells , 2013, Cell.

[48]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[49]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[50]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[51]  S. Potter,et al.  Single cell dissection of early kidney development: multilineage priming , 2014, Development.

[52]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[53]  A. Regev,et al.  Preparation of Single‐Cell RNA‐Seq Libraries for Next Generation Sequencing , 2014, Current protocols in molecular biology.

[54]  J. D. Engel,et al.  Developmental transcriptome analysis of human erythropoiesis. , 2014, Human molecular genetics.

[55]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[56]  Charles J. Vaske,et al.  Single-cell analyses of transcriptional heterogeneity during drug tolerance transition in cancer cells by RNA sequencing , 2014, Proceedings of the National Academy of Sciences.

[57]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[58]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[59]  Aviv Regev,et al.  Deconstructing transcriptional heterogeneity in pluripotent stem cells , 2014, Nature.

[60]  B. Kholodenko,et al.  Nonlinear signalling networks and cell-to-cell variability transform external signals into broadly distributed or bimodal responses , 2014, Journal of The Royal Society Interface.

[61]  Michael B. Elowitz,et al.  Dynamic Heterogeneity and DNA Methylation in Embryonic Stem Cells , 2014, Molecular cell.

[62]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[63]  M. Hemberg,et al.  Discrete Distributional Differential Expression (D3E) - A Tool for Gene Expression Analysis of Single-cell RNA-seq Data , 2015, bioRxiv.

[64]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[65]  C. David Page,et al.  Human pluripotent stem cell-derived neural constructs for predicting neural toxicity , 2015, Proceedings of the National Academy of Sciences.

[66]  Ning Leng,et al.  Oscope identifies oscillatory genes in unsynchronized single cell RNA-seq experiments , 2015, Nature Methods.

[67]  Do-Hyun Nam,et al.  Single-cell mRNA sequencing identifies subclonal heterogeneity in anti-cancer drug responses of lung adenocarcinoma cells , 2015, Genome Biology.

[68]  Greg Finak,et al.  MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA-seq data , 2015 .

[69]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[70]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[71]  P. McCullagh Partition models , 2015 .

[72]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[73]  James A. Thomson,et al.  scDD: A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016 .

[74]  S. Richardson,et al.  Beyond comparisons of means: understanding changes in gene expression at the single-cell level , 2016, Genome Biology.

[75]  Martin Hemberg,et al.  Discrete distributional differential expression (D3E) - a tool for gene expression analysis of single-cell RNA-seq data , 2015, BMC Bioinformatics.

[76]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.