Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. We observe that snRNA-seq is commonly subject to contamination by high amounts of ambient RNA, which can lead to biased downstream analyses, such as identification of spurious cell types if overlooked. We present a novel approach to quantify contamination and filter droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: (1) human differentiating preadipocytes in vitro, (2) fresh mouse brain tissue, and (3) human frozen adipose tissue (AT) from six individuals. All three data sets showed evidence of extranuclear RNA contamination, and we observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq, our clustering strategy also successfully filtered single-cell RNA-seq data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem .

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[3]  Quy H. Nguyen,et al.  Experimental Considerations for Single-Cell RNA Sequencing Approaches , 2018, Front. Cell Dev. Biol..

[4]  D. Choudhuri,et al.  Exceptional increase in the creep life of magnesium rare-earth alloys due to localized bond stiffening , 2017, Nature Communications.

[5]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[6]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[7]  A. Mayeda,et al.  Identification of cis- and trans-acting factors involved in the localization of MALAT-1 noncoding RNA to nuclear speckles. , 2012, RNA.

[8]  Aviv Regev,et al.  Massively-parallel single nucleus RNA-seq with DroNc-seq , 2017, Nature Methods.

[9]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[10]  P. Deb Finite Mixture Models , 2008 .

[11]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[12]  Samantha Riesenfeld,et al.  EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data , 2019, Genome Biology.

[13]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[14]  R. Hardison Evolution of hemoglobin and its genes. , 2012, Cold Spring Harbor perspectives in medicine.

[15]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[16]  A. Bhardwaj,et al.  In situ click chemistry generation of cyclooxygenase-2 inhibitors , 2017, Nature Communications.

[17]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[18]  Haojia Wu,et al.  Advantages of Single-Nucleus over Single-Cell RNA Sequencing of Adult Kidney: Rare Cell Types and Novel Cell States Revealed in Fibrosis. , 2018, Journal of the American Society of Nephrology : JASN.

[19]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[20]  Peng Hu,et al.  Dissecting Cell-Type Composition and Activity-Dependent Transcriptional State in Mammalian Brains by Massively Parallel Single-Nucleus RNA-Seq. , 2017, Molecular cell.

[21]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[22]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[23]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[24]  Yvan Saeys,et al.  Essential guidelines for computational method benchmarking , 2018, Genome Biology.

[25]  M. Brusco,et al.  Evaluating mixture modeling for clustering: recommendations and cautions. , 2011, Psychological methods.

[26]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[27]  R. Ulevitch,et al.  CD14: cell surface receptor and differentiation marker. , 1993, Immunology today.

[28]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[29]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[30]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[31]  Sara B. Linker,et al.  Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons , 2016, Nature Protocols.

[32]  David R. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[33]  M. Ronaghi,et al.  Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain , 2016, Science.

[34]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[35]  Nicole C El-Ali,et al.  Single-nucleus RNA-seq of differentiating human myoblasts reveals the extent of fate heterogeneity , 2016, Nucleic acids research.

[36]  Conor Fitzpatrick,et al.  Nuclear RNA-seq of single neurons reveals molecular signatures of activation , 2016, Nature communications.

[37]  Sara B. Linker,et al.  Corrigendum: Nuclear RNA-seq of single neurons reveals molecular signatures of activation , 2016, Nature Communications.

[38]  Cynthia C. Hession,et al.  Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons , 2016, Science.