Scanpro: robust proportion analysis for single cell resolution data

In higher organisms, individual cells respond to signals and perturbations by epigenetic regulation such as adjustment of gene expression. However, in addition to shifting their transcriptional profile, the adaptive response of cells can also lead to shifts in the proportions of different cell types. Recent methods such as scRNA-seq allow for the interrogation of expression on the single cell level, and can quantify individual cell type clusters within complex tissue samples. In order to identify clusters showing differential composition between different biological conditions, differential proportion analysis has recently been introduced. However, bioinformatics tools for robust proportion analysis of both replicated and unreplicated single cell datasets are critically missing. In this manuscript, we present Scanpro, a modular tool for proportion analysis, seamlessly integrating into widely accepted frameworks in the Python environment. Scanpro is fast, accurate, supports datasets without replicates, and is intended to be used by bioinformatics experts and beginners alike.

[1]  E. Kenigsberg,et al.  A shift in lung macrophage composition is associated with COVID-19 severity and recovery , 2022, Science Translational Medicine.

[2]  Ryan A. Peterson,et al.  Inference following multiple imputation for generalized additive models: an investigation of the median p-value rule with applications to the Pulmonary Hypertension Association Registry and Colorado COVID-19 hospitalization data , 2022, BMC Medical Research Methodology.

[3]  Sean K. Simmons Cell Type Composition Analysis: Comparison of statistical methods , 2022, bioRxiv.

[4]  A. Oshlack,et al.  propeller: testing for differences in cell type proportions in single cell data , 2021, bioRxiv.

[5]  Yingfeng Zheng,et al.  Effects of sex and aging on the immune cell landscape as assessed by single-cell transcriptomic analysis , 2021, Proceedings of the National Academy of Sciences.

[6]  L. Delbridge,et al.  Sex-Specific Control of Human Heart Maturation by the Progesterone Receptor , 2021, Circulation.

[7]  M. Büttner,et al.  scCODA is a Bayesian model for compositional single-cell data analysis , 2020, Nature Communications.

[8]  Junedh M. Amrute,et al.  Spatial multi-omic map of human myocardial infarction , 2020, Nature.

[9]  Michael L. Waskom,et al.  Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[10]  Andrew J. Hill,et al.  A human cell atlas of fetal chromatin accessibility , 2020, Science.

[11]  I. Amit,et al.  Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19 , 2020, Nature Medicine.

[12]  Zhaohui S. Qin,et al.  DeconPeaker, a Deconvolution Model to Identify Cell Types Based on Chromatin Accessibility in ATAC-Seq Data of Mixture Samples , 2020, Frontiers in Genetics.

[13]  S. Kuang,et al.  Temporal Dynamics and Heterogeneity of Cell Populations during Skeletal Muscle Regeneration , 2020, iScience.

[14]  Zhe Han,et al.  Single-cell RNA sequencing identifies novel cell types in Drosophila blood. , 2020, Journal of genetics and genomics = Yi chuan xue bao.

[15]  James Hartke,et al.  Multiplex immunofluorescence staining and image analysis assay for diffuse large B cell lymphoma. , 2019, Journal of immunological methods.

[16]  Jay W. Shin,et al.  Single-cell transcriptomics reveals expansion of cytotoxic CD4 T cells in supercentenarians , 2019, Proceedings of the National Academy of Sciences.

[17]  Ash A. Alizadeh,et al.  Determining cell-type abundance and expression from bulk tissues with digital cytometry , 2019, Nature Biotechnology.

[18]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[19]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[20]  Lassi Paavolainen,et al.  Systems pathology by multiplexed immunohistochemistry and whole-slide digital image analysis , 2017, Scientific Reports.

[21]  Mark A. van de Wiel,et al.  Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis , 2017, BMC Medical Research Methodology.

[22]  S. Teichmann,et al.  A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications , 2017, Genome Medicine.

[23]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[24]  A. Butte,et al.  xCell: digitally portraying the tissue cellular heterogeneity landscape , 2017, bioRxiv.

[25]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[26]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[27]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[28]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[29]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[30]  Douglas G Altman,et al.  Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines , 2009, BMC medical research methodology.

[31]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .