scPipe: A flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data

Single-cell RNA sequencing (scRNA-seq) technology allows researchers to profile the transcriptomes of thousands of cells simultaneously. Protocols that incorporate both designed and random barcodes have greatly increased the throughput of scRNA-seq, but give rise to a more complex data structure. There is a need for new tools that can handle the various barcoding strategies used by different protocols and exploit this information for quality assessment at the sample-level and provide effective visualization of these results in preparation for higher-level analyses. To this end, we developed scPipe, an R/Bioconductor package that integrates barcode demultiplexing, read alignment, UMI-aware gene-level quantification and quality control of raw sequencing data generated by multiple protocols that include CEL-seq, MARS-seq, Chromium 10X, Drop-seq and Smart-seq. scPipe produces a count matrix that is essential for downstream analysis along with an HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing. scPipe performs this processing in a few simple R commands, promoting reproducible analysis of single-cell data that is compatible with the emerging suite of open-source scRNA-seq analysis tools available in R/Bioconductor and beyond. The scPipe R package is available for download from https://www.bioconductor.org/packages/scPipe.

[1]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[2]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[3]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[4]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[5]  Dirk Eddelbuettel,et al.  Seamless R and C++ Integration with Rcpp , 2013 .

[6]  Sandrine Dudoit,et al.  clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets , 2018 .

[7]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[8]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[9]  Viktor Petukhov,et al.  dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments , 2018, Genome Biology.

[10]  Luke Zappia,et al.  Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database , 2017, bioRxiv.

[11]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[12]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[13]  Christoph Ziegenhain,et al.  zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs , 2017, bioRxiv.

[14]  Mauricio Barahona,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[17]  Keegan D. Korthauer,et al.  A statistical approach for identifying differential distributions in single-cell RNA-seq experiments , 2016, Genome Biology.

[18]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[19]  Aaron T. L. Lun,et al.  scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R , 2016 .

[20]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[21]  Sandrine Dudoit,et al.  clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets , 2018, bioRxiv.

[22]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[23]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[24]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[25]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[26]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[27]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[28]  Thanasis Margaritis,et al.  Sharq, A versatile preprocessing and QC pipeline for Single Cell RNA-seq , 2018, bioRxiv.

[29]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[30]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[31]  Jj Allaire,et al.  Dynamic Documents for R , 2016 .

[32]  N. Friedman,et al.  Densely Interconnected Transcriptional Circuits Control Cell States in Human Hematopoiesis , 2011, Cell.

[33]  Shuqiang Li,et al.  CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq , 2016, Genome Biology.

[34]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.