stochprofML: stochastic profiling using maximum likelihood estimation in R

Background Tissues are often heterogeneous in their single-cell molecular expression, and this can govern the regulation of cell fate. For the understanding of development and disease, it is important to quantify heterogeneity in a given tissue. Results We present the R package stochprofML which uses the maximum likelihood principle to parameterize heterogeneity from the cumulative expression of small random pools of cells. We evaluate the algorithm’s performance in simulation studies and present further application opportunities. Conclusion Stochastic profiling outweighs the necessary demixing of mixed samples with a saving in experimental cost and effort and less measurement error. It offers possibilities for parameterizing heterogeneity, estimating underlying pool compositions and detecting differences between cell populations between samples.

[1]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[2]  J. Rihel,et al.  Single-Cell Transcriptional Analysis of Neuronal Progenitors , 2003, Neuron.

[3]  A. Zeileis,et al.  zoo: S3 Infrastructure for Regular and Irregular Time Series , 2005, math/0505527.

[4]  Richard M. Feldman,et al.  Applied Probability and Stochastic Processes , 1995 .

[5]  Olaf Mersmann,et al.  Accurate Timing Functions , 2015 .

[6]  Pekka Ruusuvuori,et al.  Probabilistic analysis of gene expression measurements from heterogeneous tissues , 2010, Bioinform..

[7]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[8]  J. Szustakowski,et al.  Optimal Deconvolution of Transcriptional Profiling Data Using Quadratic Programming with Application to Complex Clinical Blood Samples , 2011, PloS one.

[9]  M. Pastore,et al.  Measuring Distribution Similarities Between Samples: A Distribution-Free Overlapping Index , 2019, Front. Psychol..

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  R. Sandberg Entering the era of single-cell transcriptomics in biology and medicine , 2013, Nature Methods.

[13]  Kevin A. Janes,et al.  Identifying single-cell molecular programs by stochastic profiling , 2010, Nature Methods.

[14]  Gregory J. Hunt,et al.  Dtangle: Accurate and Robust Cell Type Deconvolution , 2018, Bioinform..

[15]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[16]  Steven P. Millard,et al.  EnvStats: An R Package for Environmental Statistics , 2013 .

[17]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[18]  C. Wilke Streamlined Plot Theme and Plot Annotations for 'ggplot2' , 2015 .

[19]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[20]  Ash A. Alizadeh,et al.  Robust enumeration of cell subsets from tissue expression profiles , 2015, Nature Methods.

[21]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[22]  Henry F. Inman,et al.  The overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities , 1989 .

[23]  Fabian J Theis,et al.  Pheno-seq – linking visual features and gene expression in 3D cell culture systems , 2019, Scientific Reports.

[24]  Hananeh Aliee,et al.  AutoGeneS: Automatic gene selection using multi-objective optimization for RNA-seq deconvolution , 2020, bioRxiv.

[25]  Kevin A Janes,et al.  Parameterizing cell-to-cell regulatory heterogeneities via stochastic transcriptional profiles , 2014, Proceedings of the National Academy of Sciences.

[26]  L. Fenton The Sum of Log-Normal Probability Distributions in Scatter Transmission Systems , 1960 .

[27]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[28]  Eran Bacharach,et al.  Cell composition analysis of bulk genomics using single cell data , 2019, Nature Methods.

[29]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[30]  Kazuki Kurimoto,et al.  An improved single-cell cDNA amplification method for efficient high-density oligonucleotide microarray analysis , 2006, Nucleic acids research.

[31]  Renaud Gaujoux,et al.  CellMix: a comprehensive toolbox for gene expression deconvolution , 2013, Bioinform..

[32]  Yihui Xie,et al.  knitr: A Comprehensive Tool for Reproducible Research in R , 2018, Implementing Reproducible Research.

[33]  E. Hoffman,et al.  Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues , 2016, Scientific Reports.

[34]  Z. Modrušan,et al.  Deconvolution of Blood Microarray Data Identifies Cellular Activation Patterns in Systemic Lupus Erythematosus , 2009, PloS one.

[35]  P. Rorsman,et al.  Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels. , 2005, Genome research.