A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

[1]  John N. Weinstein,et al.  Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics , 2004, BMC Bioinformatics.

[2]  W. Paul,et al.  GATA-3 promotes Th2 responses through three different mechanisms: induction of Th2 cytokine production, selective growth of Th2 cells and inhibition of Th1 cell-specific factors , 2006, Cell Research.

[3]  James M. Roberts,et al.  A New Description of Cellular Quiescence , 2006, PLoS biology.

[4]  Michael D. Wilson,et al.  Cell Cycle Genes Are the Evolutionarily Conserved Targets of the E2F4 Transcription Factor , 2007, PloS one.

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[7]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[8]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[9]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[10]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[11]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[12]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[13]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[14]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[15]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[16]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[17]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[18]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[19]  J. Skotheim,et al.  Control of cell cycle transcription during G1 and S phases , 2013, Nature Reviews Molecular Cell Biology.

[20]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[21]  Sarah A. Teichmann,et al.  Single-Cell RNA Sequencing Reveals T Helper Cells Synthesizing Steroids De Novo to Contribute to Immune Homeostasis , 2014, Cell reports.

[22]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[23]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[24]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[25]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[26]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[27]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[28]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[29]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[30]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[31]  Bo Ding,et al.  Normalization and noise reduction for single cell RNA-seq experiments , 2015, Bioinform..

[32]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[33]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[34]  Ning Leng,et al.  Oscope identifies oscillatory genes in unsynchronized single cell RNA-seq experiments , 2015, Nature Methods.

[35]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[36]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[37]  Amalio Telenti,et al.  Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq , 2015, Bioinform..

[38]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[39]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[40]  Mingxiang Teng,et al.  On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data , 2015 .

[41]  Sarah A Teichmann,et al.  Computational assignment of cell-cycle stage from single-cell transcriptome data. , 2015, Methods.

[42]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[43]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[44]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[45]  Fabian J. Theis,et al.  Combined Single-Cell Functional and Gene Expression Analysis Resolves Heterogeneity within Stem Cell Populations , 2015, Cell stem cell.

[46]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[47]  Wolfgang Huber,et al.  RNA-Seq workflow: gene-level exploratory analysis and differential expression , 2015, F1000Research.

[48]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[49]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[50]  Aaron T. L. Lun,et al.  From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline , 2016, F1000Research.

[51]  Joseph L. Herman,et al.  Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis , 2015, Nature Methods.

[52]  Fabian J. Theis,et al.  destiny: diffusion maps for large-scale single-cell data in R , 2015, Bioinform..

[53]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[54]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[55]  David A. Knowles,et al.  Batch effects and the effective design of single-cell gene expression studies , 2016, bioRxiv.

[56]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[57]  Jong Kyoung Kim,et al.  Corrigendum: Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression , 2015, Nature Communications.

[58]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..