Current best practices in single‐cell RNA‐seq analysis: a tutorial

Single‐cell RNA‐seq has enabled gene expression to be studied at an unprecedented resolution. The promise of this technology is attracting a growing user base for single‐cell analysis methods. As more analysis tools are becoming available, it is becoming increasingly difficult to navigate this landscape and produce an up‐to‐date workflow to analyse one's data. Here, we detail the steps of a typical single‐cell RNA‐seq analysis, including pre‐processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and cell‐ and gene‐level downstream analysis. We formulate current best‐practice recommendations for these steps based on independent comparison studies. We have integrated these best‐practice recommendations into a workflow, which we apply to a public dataset to further illustrate how these steps work in practice. Our documented case study can be found at https://www.github.com/theislab/single-cell-tutorial. This review will serve as a workflow tutorial for new entrants into the field, and help established users update their analysis pipelines.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  D. Scholtens,et al.  Analysis of Differential Gene Expression Studies , 2005 .

[8]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[12]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[13]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[14]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[15]  L. Pachter Models for transcript quantification from RNA-Seq , 2011, 1104.3889.

[16]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[17]  Khusru Asadullah,et al.  What makes a good drug target? , 2012, Drug discovery today.

[18]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[19]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[20]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[21]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[22]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[23]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[24]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[25]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[26]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[27]  O. Stegle,et al.  Single-Cell Genome-Wide Bisulfite Sequencing for Assessing Epigenetic Heterogeneity , 2014, Nature Methods.

[28]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[29]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[30]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[31]  John D. Storey,et al.  Statistical significance of variables driving systematic variation in high-dimensional data , 2013, Bioinform..

[32]  Fabian J. Theis,et al.  Diffusion maps for high-dimensional single-cell analysis of differentiation data , 2015, Bioinform..

[33]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[34]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[35]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[36]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[37]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[38]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[39]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[40]  Monika S. Kowalczyk,et al.  Single-cell RNA-seq reveals changes in cell cycle and differentiation programs upon aging of hematopoietic stem cells , 2015, Genome research.

[41]  Fabian J. Theis,et al.  Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data , 2015, Bioinform..

[42]  Fabian J Theis,et al.  Diffusion pseudotime robustly reconstructs lineage branching , 2016, Nature Methods.

[43]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[44]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[45]  Mark D. Robinson,et al.  Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data , 2016, bioRxiv.

[46]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[47]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[48]  Greg Finak,et al.  The contribution of cell cycle to heterogeneity in single-cell RNA-seq data , 2016, Nature Biotechnology.

[49]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[50]  John C Marioni,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[51]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[52]  Matt Thomson,et al.  Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing. , 2016, Cell systems.

[53]  Christoph Ziegenhain,et al.  powsimR: Power analysis for bulk and single cell RNA-seq experiments , 2017, bioRxiv.

[54]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[55]  D. Thieffry Faculty Opinions recommendation of Decoding the regulatory network of early blood development from single-cell gene expression measurements. , 2017 .

[56]  Eirini Arvaniti,et al.  Sensitive detection of rare disease-associated cell subsets via representation learning , 2016, Nature Communications.

[57]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[58]  Fabian J Theis,et al.  Single cells make big data: New challenges and opportunities in transcriptomics , 2017 .

[59]  Hans Clevers,et al.  What Is Your Conceptual Definition of "Cell Type" in the Context of a Mature Organism? , 2017, Cell systems.

[60]  Thalia E. Chan,et al.  Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures , 2016, bioRxiv.

[61]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[62]  Sarah A. Teichmann,et al.  Computational approaches for interpreting scRNA‐seq data , 2017, FEBS letters.

[63]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[64]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[65]  Fabian J Theis,et al.  cgCorrect: a method to correct for confounding cell–cell variation due to cell growth in single-cell transcriptomics , 2016, bioRxiv.

[66]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[67]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[68]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[69]  Fabian J. Theis,et al.  PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells , 2017, Genome Biology.

[70]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[71]  Hisanori Kiryu,et al.  SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation , 2016, bioRxiv.

[72]  Theo Knijnenburg,et al.  Extracting Intercellular Signaling Network of Cancer Tissues using Ligand-Receptor Expression Patterns from Whole-tumor and Single-cell Transcriptomes , 2017, Scientific Reports.

[73]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[74]  Yang An,et al.  Pseudogenes regulate parental gene expression via ceRNA network , 2016, Journal of cellular and molecular medicine.

[75]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[76]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[77]  John C. Marioni,et al.  Testing for differential abundance in mass cytometry data , 2017, Nature Methods.

[78]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[79]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[80]  Davis J. McCarthy,et al.  f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq , 2017, Genome Biology.

[81]  Yarden Katz,et al.  A single-cell survey of the small intestinal epithelium , 2017, Nature.

[82]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[83]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.

[84]  Xun Zhu,et al.  Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists , 2017, Genome Medicine.

[85]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[86]  Vincent Gardeux,et al.  ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data , 2016, bioRxiv.

[87]  E. Morrisey,et al.  Distinct Mesenchymal Lineages and Niches Promote Epithelial Self-Renewal and Myofibrogenesis in the Lung , 2017, Cell.

[88]  Roland Eils,et al.  The Human Cell Atlas White Paper , 2018, 1810.05192.

[89]  J. Marioni,et al.  Using single‐cell genomics to understand developmental processes and cell fate decisions , 2018, Molecular systems biology.

[90]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[91]  Charlotte Soneson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data , 2018, F1000Research.

[92]  Kevin Rue-Albrecht,et al.  iSEE: Interactive SummarizedExperiment Explorer , 2018, F1000Research.

[93]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[94]  Thawfeek M. Varusai,et al.  The Reactome Pathway Knowledgebase , 2017, Nucleic acids research.

[95]  Allon M. Klein,et al.  A single cell atlas of the tracheal epithelium reveals the CFTR-rich pulmonary ionocyte , 2018, Nature.

[96]  Allon M. Klein,et al.  Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo , 2018, Science.

[97]  Kerstin B. Meyer,et al.  Single-cell reconstruction of the early maternal–fetal interface in humans , 2018, Nature.

[98]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[99]  Fabian J Theis,et al.  Impulse model-based differential expression analysis of time course sequencing data , 2017, bioRxiv.

[100]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[101]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[102]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[103]  I. Amit,et al.  Lung Single-Cell Signaling Interaction Map Reveals Basophil Role in Macrophage Imprinting , 2018, Cell.

[104]  Aviv Regev,et al.  A revised airway epithelial hierarchy includes CFTR-expressing ionocytes , 2018, Nature.

[105]  Fabian J Theis,et al.  Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics , 2018, Science.

[106]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[107]  Mitulkumar V. Patel,et al.  iS-CellR: a user-friendly tool for analyzing and visualizing single-cell RNA sequencing data , 2018, Bioinform..

[108]  Evan Z. Macosko,et al.  Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain , 2018, Cell.

[109]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[110]  Bryan D. Bryson,et al.  Panoramic stitching of heterogeneous single-cell transcriptomic data , 2018, bioRxiv.

[111]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[112]  Fabian J. Theis,et al.  Statistical single cell multi-omics integration , 2018 .

[113]  Sarah Webb Deep learning for biology , 2018, Nature.

[114]  Zev J. Gartner,et al.  DoubletFinder: Doublet detection in single-cell RNA sequencing data using artificial nearest neighbors , 2018, bioRxiv.

[115]  Lars E. Borm,et al.  Molecular Architecture of the Mouse Nervous System , 2018, Cell.

[116]  Luyi Tian,et al.  Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data , 2018, F1000Research.

[117]  David van Dijk,et al.  Manifold learning-based methods for analyzing single-cell RNA-sequencing data , 2018 .

[118]  Christoph Hafemeister,et al.  Developmental diversification of cortical inhibitory interneurons , 2017, Nature.

[119]  Kerstin B. Meyer,et al.  Fast Batch Alignment of Single Cell Transcriptomes Unifies Multiple Mouse Cell Atlases into an Integrated Landscape , 2018, bioRxiv.

[120]  Tao Wang,et al.  RISC: robust integration of single-cell RNA-seq datasets with different extents of cell cluster overlap , 2018, bioRxiv.

[121]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[122]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[123]  Sarah A Teichmann,et al.  A test metric for assessing single-cell RNA-seq batch correction , 2018, Nature Methods.

[124]  Charlotte Soneson,et al.  Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications , 2018, Genome Biology.

[125]  Fabian J. Theis,et al.  FASTGenomics: An analytical ecosystem for single-cell RNA sequencing data , 2018, bioRxiv.

[126]  Lior Pachter,et al.  Highly Multiplexed Single-Cell RNA-seq for Defining Cell Population and Transcriptional Spaces , 2018, bioRxiv.

[127]  Erica A.K. DePasquale,et al.  DoubletDecon: Cell-State Aware Removal of Single-Cell RNA-Seq Doublets , 2018, bioRxiv.

[128]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[129]  Caleb Weinreb,et al.  SPRING: a kinetic interface for visualizing high dimensional single-cell expression data , 2017, bioRxiv.

[130]  Allon M. Klein,et al.  The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution , 2018, Science.

[131]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[132]  Christoph Ziegenhain,et al.  zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs , 2017, bioRxiv.

[133]  Aaron Lun,et al.  Overcoming systematic errors caused by log-transformation of normalized single-cell RNA sequencing data , 2018, bioRxiv.

[134]  R. Irizarry,et al.  Missing data and technical variability in single‐cell RNA‐sequencing experiments , 2018, Biostatistics.

[135]  J. Marioni,et al.  Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets , 2018, Molecular systems biology.

[136]  Russell B. Fletcher,et al.  Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics , 2017, BMC Genomics.

[137]  Jessica C. Mar,et al.  Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data , 2018, BMC Bioinformatics.

[138]  Fabian J. Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2018, Nature Communications.

[139]  Matthew D. Young,et al.  SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data , 2018, bioRxiv.

[140]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[141]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[142]  M. Hemberg,et al.  False signals induced by single-cell imputation , 2018, F1000Research.

[143]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[144]  Ambrose J. Carr,et al.  Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment , 2018, Cell.

[145]  Luke Zappia,et al.  Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database , 2017, bioRxiv.

[146]  Mohammad Lotfollahi,et al.  Generative modeling and latent space arithmetics predict single-cell perturbation response across cell types, studies and species , 2018, bioRxiv.

[147]  Evan Z. Macosko,et al.  Integrative inference of brain cell similarities and differences from single-cell genomics , 2018, bioRxiv.

[148]  S. Shen-Orr,et al.  Alignment of single-cell trajectories to compare cellular expression dynamics , 2018, Nature Methods.

[149]  David Tse,et al.  Towards a post-clustering test for differential expression , 2018, bioRxiv.

[150]  Jean Yee Hwa Yang,et al.  Impact of similarity metrics on single-cell RNA-seq data clustering , 2018, Briefings Bioinform..

[151]  diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering. , 2019 .

[152]  Pak Chung Sham,et al.  Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data , 2019, Briefings Bioinform..

[153]  Sandrine Dudoit,et al.  Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq. , 2019, Cell systems.

[154]  Christopher S. McGinnis,et al.  DoubletFinder: Doublet detection in single-cell RNA sequencing data using artificial nearest neighbors , 2018, bioRxiv.

[155]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[156]  Allon M Klein,et al.  Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. , 2019, Cell systems.

[157]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[158]  Fabian J Theis,et al.  An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics , 2018, Nature Communications.

[159]  Fabian J Theis,et al.  PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells , 2019, Genome Biology.

[160]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[161]  Christoph Ziegenhain,et al.  A systematic evaluation of single cell RNA-seq analysis pipelines , 2019, Nature Communications.