A Compositional Model to Assess Expression Changes from Single-Cell Rna-Seq Data

On the problem of scoring genes for evidence of changes in the distribution of single-cell expression, we introduce an empirical Bayesian mixture approach and evaluate its operating characteristics in a range of numerical experiments. The proposed approach leverages cell-subtype structure revealed in cluster analysis in order to boost gene-level information on expression changes. Cell clustering informs gene-level analysis through a specially-constructed prior distribution over pairs of multinomial probability vectors; this prior meshes with available model-based tools that score patterns of differential expression over multiple subtypes. We derive an explicit formula for the posterior probability that a gene has the same distribution in two cellular conditions, allowing for a gene-specific mixture over subtypes in each condition. Advantage is gained by the compositional structure of the model, in which a host of gene-specific mixture components are allowed, but also in which the mixing proportions are constrained at the whole cell level. This structure leads to a novel form of information sharing through which the cell-clustering results support gene-level scoring of differential distribution. The result, according to our numerical experiments, is improved sensitivity compared to several standard approaches for detecting distributional expression changes.

[1]  Omkar Muralidharan,et al.  An empirical Bayes mixture method for effect size and false discovery rate estimation , 2010, 1010.1425.

[2]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[3]  Mauro Maggioni,et al.  Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms , 2017, J. Mach. Learn. Res..

[4]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[5]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[6]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[7]  A. Oudenaarden,et al.  Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences , 2008, Cell.

[8]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[9]  A. Taudes,et al.  A Multivariate Polya Model of Brand Choice and Purchase Incidence , 1986 .

[10]  Ning Leng,et al.  Oscope identifies oscillatory genes in unsynchronized single cell RNA-seq experiments , 2015, Nature Methods.

[11]  N. Navin,et al.  The first five years of single-cell cancer genomics and beyond , 2015, Genome research.

[12]  D. B. Dahl Modal clustering in a class of product partition models , 2009 .

[13]  Staci A. Sorensen,et al.  Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics , 2016 .

[14]  Deepak Kumar Jha,et al.  A high-resolution transcriptome map of cell cycle reveals novel connections between periodic genes and cancer , 2016, Cell Research.

[15]  R. Satija,et al.  Single-cell RNA sequencing to explore immune cell heterogeneity , 2017, Nature Reviews Immunology.

[16]  Mark D. Robinson,et al.  Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data , 2017, bioRxiv.

[17]  J. Peccoud,et al.  Markovian Modeling of Gene-Product Synthesis , 1995 .

[18]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[19]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[20]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[21]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[22]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[23]  Fabian J. Theis,et al.  Diffusion maps for high-dimensional single-cell analysis of differentiation data , 2015, Bioinform..

[24]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[25]  Jean Yee Hwa Yang,et al.  Impact of similarity metrics on single-cell RNA-seq data clustering , 2018, Briefings Bioinform..

[26]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[27]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[28]  Wenan Chen,et al.  UMI-count modeling and differential expression analysis for single-cell RNA sequencing , 2018, Genome Biology.

[29]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[30]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[31]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[32]  Takamasa Kudo,et al.  Measuring Signaling and RNA-Seq in the Same Cell Links Gene Expression to Dynamic Patterns of NF-κB Activation. , 2017, Cell systems.

[33]  Yanyuan Ma,et al.  Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation. , 2013, Electronic journal of statistics.

[34]  Martin Hemberg,et al.  Discrete distributional differential expression (D3E) - a tool for gene expression analysis of single-cell RNA-seq data , 2015, BMC Bioinformatics.

[35]  Tal Nawy,et al.  Single-cell sequencing , 2013, Nature Methods.

[36]  Dylan S. Small,et al.  Bayesian Testing of Many Hypotheses × Many Genes: A Study of Sleep Apnea , 2009 .

[37]  Andrew McDavid,et al.  Modeling Bi-modality Improves Characterization of Cell Cycle on Gene Expression in Single Cells , 2014, bioRxiv.

[38]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[39]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[40]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[41]  D. Altieri,et al.  The cancer antiapoptosis mouse survivin gene: characterization of locus and transcriptional requirements of basal and cell cycle-dependent expression. , 1999, Cancer research.

[42]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[43]  F. Tang,et al.  The Transcriptome and DNA Methylome Landscapes of Human Primordial Germ Cells , 2015, Cell.

[44]  D. Mock,et al.  Innate-like functions of natural killer T cell subsets result from highly divergent gene programs , 2016, Nature Immunology.

[45]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[46]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[47]  Christina Kendziorski,et al.  EBSeq: improving mixing computations for multi-group differential expression analysis , 2020, bioRxiv.

[48]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[49]  J. Marioni,et al.  How Single-Cell Genomics Is Changing Evolutionary and Developmental Biology. , 2017, Annual review of cell and developmental biology.

[50]  Kurt Engeland,et al.  RHAMM is differentially expressed in the cell cycle and downregulated by the tumor suppressor p53 , 2008, Cell cycle.

[51]  Steven D Chang,et al.  Single-Cell RNAseq analysis of infiltrating neoplastic cells at the migrating front of human glioblastoma , 2017, bioRxiv.

[52]  S. Yakowitz,et al.  On the Identifiability of Finite Mixtures , 1968 .