Noise regularization removes correlation artifacts in single-cell RNA-seq data preprocessing

With the rapid advancement of single-cell RNA-seq (scRNA-seq) technology, many data preprocessing methods have been proposed to address numerous systematic errors and technical variabilities inherent in this technology. While these methods have been demonstrated to be effective in recovering individual gene expression, the suitability to the inference of gene-gene associations and subsequent gene networks reconstruction have not been systemically investigated. In this study, we benchmarked five representative scRNA-seq normalization/imputation methods on human cell atlas bone marrow data with respect to their impact on inferred gene-gene associations. Our results suggested that a considerable amount of spurious correlations was introduced during the data preprocessing steps due to over-smoothing of the raw data. We proposed a model-agnostic noise regularization method that can effectively eliminate the correlation artifacts. The noise regularized gene-gene correlations were further used to reconstruct gene co-expression network and successfully revealed several known immune cell modules.

[1]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[2]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[3]  R. Irizarry,et al.  Missing data and technical variability in single‐cell RNA‐sequencing experiments , 2018, Biostatistics.

[4]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[5]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[6]  Paul Shannon,et al.  CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API , 2015, F1000Research.

[7]  Tallulah S Andrews,et al.  False signals induced by single-cell imputation , 2018, F1000Research.

[8]  Sara Ballouz,et al.  Guidance for RNA-seq co-expression network construction and analysis: safety in numbers , 2015, Bioinform..

[9]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[10]  E. Levanon,et al.  Human housekeeping genes, revisited. , 2013, Trends in genetics : TIG.

[11]  Peter Csermely,et al.  The EntOptLayout Cytoscape plug-in for the efficient visualization of major protein complexes in protein–protein interaction and signalling networks , 2019, Bioinform..

[12]  J. A. Bondy,et al.  Graph Theory , 2008, Graduate Texts in Mathematics.

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[15]  Stijn van Dongen,et al.  Construction, Visualisation, and Clustering of Transcription Networks from Microarray Expression Data , 2007, PLoS Comput. Biol..

[16]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[17]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[18]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[19]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[20]  H. Heyn,et al.  Single-cell transcriptomics unveils gene regulatory network plasticity , 2018, Genome Biology.

[21]  Wei Keat Lim,et al.  The transcriptional network for mesenchymal transformation of brain tumors , 2009, Nature.

[22]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[23]  Ziv Bar-Joseph,et al.  Deep learning for inferring gene relationships from single-cell expression data , 2019, Proceedings of the National Academy of Sciences.

[24]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[25]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[26]  Luyi Tian,et al.  Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments , 2019, Nature Methods.

[27]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[28]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[29]  Lihua Jiang,et al.  Inferring Transcriptional Interactions by the Optimal Integration of ChIP-chip and Knock-out Data , 2009, Bioinformatics and biology insights.

[30]  R. Satija,et al.  Single-cell RNA sequencing to explore immune cell heterogeneity , 2017, Nature Reviews Immunology.

[31]  Gary D Bader,et al.  The Genetic Landscape of a Cell , 2010, Science.

[32]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[33]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..