Fast identification of differential distributions in single-cell RNA-sequencing data with waddR

Abstract Motivation Single-cell gene expression distributions measured by single-cell RNA-sequencing (scRNA-seq) often display complex differences between samples. These differences are biologically meaningful but cannot be identified using standard methods for differential expression. Results Here, we derive and implement a flexible and fast differential distribution testing procedure based on the 2-Wasserstein distance. Our method is able to detect any type of difference in distribution between conditions. To interpret distributional differences, we decompose the 2-Wasserstein distance into terms that capture the relative contribution of changes in mean, variance and shape to the overall difference. Finally, we derive mathematical generalizations that allow our method to be used in a broad range of disciplines other than scRNA-seq or bioinformatics. Availability and implementation Our methods are implemented in the R/Bioconductor package waddR, which is freely available at https://github.com/goncalves-lab/waddR, along with documentation and examples. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[2]  David A. Knowles,et al.  Batch effects and the effective design of single-cell gene expression studies , 2016, Scientific Reports.

[3]  Sandrine Dudoit,et al.  Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq. , 2019, Cell systems.

[4]  S. Richardson,et al.  Beyond comparisons of means: understanding changes in gene expression at the single-cell level , 2016, Genome Biology.

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  Tianyu Wang,et al.  SigEMD: A powerful method for differential gene expression analysis in single-cell RNA sequencing data. , 2018, Methods.

[7]  Yoav Zemel,et al.  Statistical Aspects of Wasserstein Distances , 2018, Annual Review of Statistics and Its Application.

[8]  Maria K. Jaakkola,et al.  Comparison of methods to detect differentially expressed genes between single-cell populations , 2016, Briefings Bioinform..

[9]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[10]  Kerstin B. Meyer,et al.  Single-cell reconstruction of the early maternal–fetal interface in humans , 2018, Nature.

[11]  Marcel J. T. Reinders,et al.  Fewer permutations, more accurate P-values , 2009, Bioinform..

[12]  Roberto Buizza,et al.  Ensemble Forecasting and the Need for Calibration , 2018 .

[13]  Antonio Irpino,et al.  Basic statistics for distributional symbolic variables: a new metric-based approach , 2011, Advances in Data Analysis and Classification.

[14]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[15]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[16]  P. Park,et al.  Human Decidual Natural Killer Cells Are a Unique NK Cell Subset with Immunomodulatory Potential , 2003, The Journal of experimental medicine.

[17]  Barbara Di Camillo,et al.  Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods , 2017, Front. Genet..

[18]  Sarah A. Teichmann,et al.  Aging increases cell-to-cell transcriptional variability upon immune stimulation , 2017, Science.

[19]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[20]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[21]  Xuegong Zhang,et al.  DEsingle for detecting three types of differential expression in single-cell RNA-seq data , 2017, bioRxiv.

[22]  Boyang Li,et al.  Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data , 2019, BMC Bioinformatics.

[23]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[24]  Raphael Gottardo,et al.  Orchestrating single-cell analysis with Bioconductor , 2019, Nature Methods.

[25]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[26]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[27]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[28]  J. Marioni,et al.  Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data , 2016, bioRxiv.

[29]  Xuegong Zhang,et al.  Differential expression analyses for single-cell RNA-Seq: old questions on new data , 2016, Quantitative Biology.

[30]  Satoru Miyano,et al.  D3M: Detection of differential distributions of methylation levels , 2015, bioRxiv.

[31]  N. Jabrane-Ferrat Features of Human Decidual NK Cells in Healthy Pregnancy and During Viral Infection , 2019, Front. Immunol..