Sharq, A versatile preprocessing and QC pipeline for Single Cell RNA-seq

Despite the meteoric rise of single cell RNA-seq, only a few preprocessing pipelines exist that are able to perform all steps from the original fastq files to a gene expression table ready for further analysis. Here we present Sharq, a versatile preprocessing pipeline designed to work with plate-based 3’-end protocols that include Unique Molecular Identifiers (UMIs). Sharq performs stringent step-wise trimming of reads, assigns them to features according to a flexible hierarchical model, and uses the barcode and UMI information to avoid amplification biases and produce gene expression tables. Additionally, Sharq provides an extensive plate diagnostics report for quality control and troubleshooting, including that of spatial artefacts. The diagnostics report includes measures of the quality of the individual plate wells as well as a robust assessment which of them contain material from live cells. Collectively, the innovative approaches presented here provide a valuable tool for processing and quality control of single cell RNA-seq data.

[1]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[2]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[3]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  Michael B. Stadler,et al.  Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation , 2015, Nature Biotechnology.

[6]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[7]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[8]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[9]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[10]  H. Hug,et al.  Measurement of the number of molecules of a single mRNA species in a complex mRNA preparation. , 2003, Journal of theoretical biology.

[11]  Kathleen F. Kerr,et al.  The External RNA Controls Consortium: a progress report , 2005, Nature Methods.

[12]  The External Rna Controls Consortium The External RNA Controls Consortium: a progress report , 2005 .

[13]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[14]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[15]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[16]  Kate Voss,et al.  Full-stack genomics pipelining with GATK4 + WDL + Cromwell , 2017 .

[17]  Lior Pachter,et al.  Barcode identification for single cell genomics , 2017, BMC Bioinformatics.

[18]  Yuriy L. Orlov,et al.  Complexity: an internet resource for analysis of DNA sequence complexity , 2004, Nucleic Acids Res..

[19]  Kun Zhang,et al.  A comparative strategy for single-nucleus and single-cell transcriptomes confirms accuracy in predicted cell-type expression from nuclear RNA , 2017, Scientific Reports.

[20]  Shuqiang Li,et al.  CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq , 2016, Genome Biology.