V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data

Abstract Summary Single-cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including ‘unwanted’ variation that needs to be removed in downstream analyses (e.g. batch effects) and ‘wanted’ or biological sources of variation (e.g. variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA)-based algorithms, are commonly used for batch correction and more recently for studying ‘wanted’ variation in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge. To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA or ZINB-WaVE, we developed an R Shiny application [Visual Surrogate Variable Analysis (V-SVA)] that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods. Availability and implementation The V-SVA Shiny application is publicly hosted at https://vsva.jax.org/ and the source code is freely available at https://github.com/nlawlor/V-SVA. Contact leed13@miamioh.edu or duygu.ucar@jax.org Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[2]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[3]  M. Diamond,et al.  The broad-spectrum antiviral functions of IFIT and IFITM proteins , 2012, Nature Reviews Immunology.

[4]  Somnath Datta,et al.  Statistical analysis of next generation sequencing date , 2014 .

[5]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[6]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[7]  Aaron T. L. Lun,et al.  Differential Expression Analysis of Complex RNA-seq Experiments Using edgeR , 2014 .

[8]  Donghyung Lee,et al.  Detection of correlated hidden factors from single cell transcriptomes using Iteratively Adjusted-SVA (IA-SVA) , 2018, Scientific Reports.

[9]  A. Luster,et al.  The chemokine system in innate immunity. , 2015, Cold Spring Harbor perspectives in biology.

[10]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[11]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..