SCExecute: custom cell barcode-stratified analyses of scRNA-seq data

Abstract Motivation In single-cell RNA-sequencing (scRNA-seq) data, stratification of sequencing reads by cellular barcode is necessary to study cell-specific features. However, apart from gene expression, the analyses of cell-specific features are not sufficiently supported by available tools designed for high-throughput sequencing data. Results We introduce SCExecute, which executes a user-provided command on barcode-stratified, extracted on-the-fly, single-cell binary alignment map (scBAM) files. SCExecute extracts the alignments with each cell barcode from aligned, pooled single-cell sequencing data. Simple commands, monolithic programs, multi-command shell scripts or complex shell-based pipelines are then executed on each scBAM file. scBAM files can be restricted to specific barcodes and/or genomic regions of interest. We demonstrate SCExecute with two popular variant callers—GATK and Strelka2—executed in shell-scripts together with commands for BAM file manipulation and variant filtering, to detect single-cell-specific expressed single nucleotide variants from droplet scRNA-seq data (10X Genomics Chromium System). In conclusion, SCExecute facilitates custom cell-level analyses on barcoded scRNA-seq data using currently available tools and provides an effective solution for studying low (cellular) frequency transcriptome features. Availability and implementation SCExecute is implemented in Python3 using the Pysam package and distributed for Linux, MacOS and Python environments from https://horvathlab.github.io/NGS/SCExecute. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  N. Edwards,et al.  SCReadCounts: estimation of cell-level SNVs expression from scRNA-seq data , 2021, BMC Genomics.

[2]  A. Horvath,et al.  Improved SNV Discovery in Barcode-Stratified scRNA-seq Alignments , 2021, bioRxiv.

[3]  A. Dobin,et al.  STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data , 2021, bioRxiv.

[4]  Xinghua Pan,et al.  Identification of a distinct luminal subgroup diagnosing and stratifying early stage prostate cancer by tissue-based single-cell RNA sequencing , 2020, Molecular cancer.

[5]  W. Gao,et al.  Single-Cell Characterization of Malignant Phenotypes and Developmental Trajectories of Adrenal Neuroblastoma. , 2020, Cancer cell.

[6]  K. Tsaneva-Atanasova,et al.  scReQTL: an approach to correlate SNVs to gene expression from individual scRNA-seq datasets , 2020, BMC Genomics.

[7]  Nawaf Alomran,et al.  Estimating the Allele-Specific Expression of SNVs From 10× Genomics Single-Cell RNA-Sequencing Data , 2020, Genes.

[8]  Mengjie Chen,et al.  SNV identification from single-cell RNA sequencing data. , 2019, Human molecular genetics.

[9]  R. Sandberg,et al.  Transcriptional bursts explain autosomal random monoallelic expression and affect allelic imbalance , 2019, bioRxiv.

[10]  Krishna R. Kalari,et al.  Cell-level somatic mutation detection from single-cell RNA sequencing , 2019, Bioinform..

[11]  Son K. Pham,et al.  Hera-T: an efficient and accurate approach for quantifying gene abundances from 10X-Chromium data with high rates of non-exonic reads , 2019, bioRxiv.

[12]  Joshua M. Dempster,et al.  Genetic and transcriptional evolution alters cancer cell line drug response , 2018, Nature.

[13]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[14]  Christopher T. Saunders,et al.  Strelka2: fast and accurate calling of germline and somatic variants , 2018, Nature Methods.

[15]  A. Heger,et al.  UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy , 2016, bioRxiv.

[16]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[17]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[18]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[19]  K. Sirotkin,et al.  dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. , 1999, Genome research.

[20]  P. Pourquier,et al.  [Genetic and transcriptional evolution alters cancer cell line drug response]. , 2018, Bulletin du cancer.