Stratified Test Alleviates Batch Effects in Single-Cell Data

Analyzing single-cell sequencing data across batches is challenging. We find that the Van Elteren test, a stratified version of Wilcoxon rank-sum test, elegantly mitigates the problem. We also modified the common language effect size to supplement this test, further improving its utility. On both simulated and real patient data we show the ability of Van Elteren test to control for false positives and false negatives. The effect size also estimates the differences between cell types more accurately.

[1]  Robert M. Graham,et al.  Winter storms accelerate the demise of sea ice in the Atlantic sector of the Arctic Ocean , 2019, Scientific Reports.

[2]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[3]  P Elterenvan On the combination of independent two-sample tests of wilcoxon , 1960 .

[4]  C. Blyth On Simpson's Paradox and the Sure-Thing Principle , 1972 .

[5]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[6]  V. Govardovskii,et al.  G-protein betagamma-complex is crucial for effic ient signal amplific ation in vision , 2011 .

[7]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[8]  Yumei Li,et al.  Single-nuclei RNA-seq on human retinal tissue provides improved transcriptome profiling , 2018, bioRxiv.

[9]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[10]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[11]  Youyong Zhu,et al.  Whole-genome resequencing of 472 Vitis accessions for grapevine diversity and demographic history analyses , 2019, Nature Communications.

[12]  D. Goodin The cambridge dictionary of statistics , 1999 .

[13]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[14]  M. Kunitski,et al.  Double-slit photoelectron interference in strong-field ionization of the neon dimer , 2018, Nature Communications.

[15]  K. McGraw,et al.  A common language effect size statistic. , 1992 .

[16]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[17]  D. Kerby The Simple Difference Formula: An Approach to Teaching Nonparametric Correlation1: , 2014 .

[18]  B. Gibson,et al.  Proteome and Secretome Dynamics of Human Retinal Pigment Epithelium in Response to Reactive Oxygen Species , 2019, Scientific Reports.

[19]  E. Hovig,et al.  Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses , 2015, Biostatistics.

[20]  Aldenor G. Santos,et al.  Occurrence of the potent mutagens 2- nitrobenzanthrone and 3-nitrobenzanthrone in fine airborne particles , 2019, Scientific Reports.

[21]  J. Naggert,et al.  A Mutation in Syne2 Causes Early Retinal Defects in Photoreceptors, Secondary Neurons, and Müller Glia. , 2015, Investigative ophthalmology & visual science.

[22]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[23]  B. S. Everitt,et al.  Comprar The Cambridge Dictionary of Statistics | B. S. Everitt | 9780521766999 | Cambridge University Press , 2010 .