Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data

Single-cell RNA sequencing (scRNA-seq) is a popular and powerful technology that allows you to profile the whole transcriptome of a large number of individual cells. However, the analysis of the large volumes of data generated from these experiments requires specialized statistical and computational methods. Here we present an overview of the computational workflow involved in processing scRNA-seq data. We discuss some of the most common tasks and the tools available for addressing central biological questions. In this article and our companion website ( https://scrnaseq-course.cog.sanger.ac.uk/website/index.html ), we provide guidelines regarding best practices for performing computational analyses. This tutorial provides a hands-on guide for experimentalists interested in analyzing their data as well as an overview for bioinformaticians seeking to develop new computational methods. In this Tutorial Review, Hemberg et al. present an overview of the computational workflow involved in processing single-cell RNA sequencing data.

[1]  M. Robinson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data. , 2018, F1000Research.

[2]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[3]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[4]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[5]  Jiacheng Yao,et al.  Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems , 2018, bioRxiv.

[6]  Luyi Tian,et al.  Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data , 2018, F1000Research.

[7]  Allon M Klein,et al.  Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. , 2019, Cell systems.

[8]  Helena L. Crowell,et al.  On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data , 2019, bioRxiv.

[9]  Martin Hemberg,et al.  M3Drop: dropout-based feature selection for scRNASeq , 2018, Bioinform..

[10]  F. Ginhoux,et al.  Mpath maps multi-branching single-cell trajectories revealing progenitor cell progression during development , 2016, Nature Communications.

[11]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, BMC Genomics.

[12]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[13]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[14]  Wei Chen,et al.  DNA4mC-LIP: a linear integration method to identify N4-methylcytosine site in multiple species , 2020, Bioinform..

[15]  John Crowley,et al.  Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat , 2015, BMC Bioinformatics.

[16]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[17]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[18]  Fabian J Theis,et al.  Generalizing RNA velocity to transient cell states through dynamical modeling , 2019, bioRxiv.

[19]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[22]  C. Ponting,et al.  Single-Cell Multiomics: Multiple Measurements from Single Cells , 2017, Trends in genetics : TIG.

[23]  Christopher S. McGinnis,et al.  DoubletFinder: Doublet detection in single-cell RNA sequencing data using artificial nearest neighbors , 2018, bioRxiv.

[24]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[25]  R. Sandberg,et al.  Single-cell RNA counting at allele and isoform resolution using Smart-seq3 , 2020, Nature Biotechnology.

[26]  Jingshu Wang,et al.  Data denoising with transfer learning in single-cell transcriptomics , 2019, Nature Methods.

[27]  Raphael Gottardo,et al.  Orchestrating single-cell analysis with Bioconductor , 2019, Nature Methods.

[28]  Fabian J Theis,et al.  PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells , 2019, Genome Biology.

[29]  Il-Youp Kwak,et al.  DrImpute: imputing dropout events in single cell RNA sequencing data , 2017, BMC Bioinformatics.

[30]  Luke Zappia,et al.  Clustering trees: a visualization for evaluating clusterings at multiple resolutions , 2018, bioRxiv.

[31]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[32]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[33]  Andrew J. Hill,et al.  The single cell transcriptional landscape of mammalian organogenesis , 2019, Nature.

[34]  Berthold Göttgens,et al.  Assessing the reliability of spike-in normalization for analyses of single-cell RNA sequencing data , 2017, bioRxiv.

[35]  K. Kirschner,et al.  Experimental design for single-cell RNA sequencing , 2017, Briefings in functional genomics.

[36]  Sara Ballouz,et al.  Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor , 2018, Nature Communications.

[37]  Fabiana M. Duarte,et al.  Inference and effects of barcode multiplets in droplet-based single-cell assays , 2020, Nature Communications.

[38]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[39]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[40]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[41]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[42]  Jesse M Zhang,et al.  Valid Post-clustering Differential Analysis for Single-Cell RNA-Seq. , 2019, Cell systems.

[43]  Dennis Kostka,et al.  scds: computational annotation of doublets in single-cell RNA sequencing data , 2019, Bioinform..

[44]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[45]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[46]  Benjamin J Raphael,et al.  netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis. , 2020, Genome research.

[47]  Donald A. Jackson,et al.  How many principal components? stopping rules for determining the number of non-trivial axes revisited , 2005, Comput. Stat. Data Anal..

[48]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[49]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[50]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[51]  Sarah A Teichmann,et al.  Computational assignment of cell-cycle stage from single-cell transcriptome data. , 2015, Methods.

[52]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[53]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[54]  Valentine Svensson Droplet scRNA-seq is not zero-inflated , 2020, Nature Biotechnology.

[55]  Pak Chung Sham,et al.  Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data , 2019, Briefings Bioinform..

[56]  Åsa K. Björklund,et al.  Smart-seq2 for sensitive full-length transcriptome profiling in single cells , 2013, Nature Methods.

[57]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[58]  M. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[59]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[60]  George C. Linderman,et al.  UMAP does not preserve global structure any better than t-SNE when using the same initialization , 2019, bioRxiv.

[61]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[62]  A. Tanay,et al.  MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions , 2019, Genome Biology.

[63]  S. Sprecher,et al.  Single cell transcriptome atlas of the Drosophila larval brain , 2019, eLife.

[64]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[65]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[66]  Y. Saeys,et al.  Computational methods for trajectory inference from single‐cell transcriptomics , 2016, European journal of immunology.

[67]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[68]  R. Satija,et al.  Integrative single-cell analysis , 2019, Nature Reviews Genetics.

[69]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[70]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[71]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[72]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[73]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[74]  Tallulah S Andrews,et al.  False signals induced by single-cell imputation , 2018, F1000Research.

[75]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[76]  N. Neff,et al.  Quantitative assessment of single-cell RNA-sequencing methods , 2013, Nature Methods.

[77]  Charlotte Soneson,et al.  Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications , 2018, Genome Biology.

[78]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[79]  Evan Z. Macosko,et al.  Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution , 2019, Science.

[80]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[81]  Patrik L. Ståhl,et al.  Visualization and analysis of gene expression in tissue sections by spatial transcriptomics , 2016, Science.

[82]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[83]  Samantha Riesenfeld,et al.  EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data , 2019, Genome Biology.

[84]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[85]  Matthew Stephens,et al.  Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis , 2020, Nature Genetics.

[86]  Wenhao Tang,et al.  bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data , 2019, Bioinform..

[87]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[88]  Christoph Ziegenhain,et al.  A systematic evaluation of single cell RNA-seq analysis pipelines , 2019, Nature Communications.

[89]  M. Newton,et al.  SCnorm: robust normalization of single-cell RNA-seq data , 2017, Nature Methods.

[90]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[91]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.