Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA sequencing data quality control

Motivation Quality control (QC) is a critical step in single-cell RNA-seq (scRNA-seq) data analysis. Low-quality cells are removed from the analysis during the QC process to avoid misinterpretation of the data. One of the important QC metrics is the mitochondrial proportion (mtDNA%), which is used as a threshold to filter out low-quality cells. Early publications in the field established a threshold of 5% and since then, it has been used as a default in several software packages for scRNA-seq data analysis and adopted as a standard in many scRNA-seq studies. However, the validity of using a uniform threshold across different species, single-cell technologies, tissues, and cell types has not been adequately assessed. Results We systematically analyzed 5,530,106 cells reported in 1,349 annotated datasets available in the PanglaoDB database and found that the average mtDNA% in scRNA-seq data across human tissues is significantly higher than in mouse tissues. This difference is not confounded by the platform used to generate the data. Based on this finding, we propose new reference values of the mtDNA% for 121 tissues of mice and 44 tissues of humans. In general, for mouse tissues, the 5% threshold performs well to distinguish between healthy and low-quality cells. However, for human tissues, the 5% threshold should be reconsidered as it fails to accurately discriminate between healthy and low-quality cells in 29.5% (13 of 44) tissues analyzed. We conclude that omitting the mtDNA% QC filter or adopting a suboptimal mtDNA% threshold may lead to erroneous biological interpretations of scRNA-seq data. Availability The code used to download datasets, perform the analyzes, and produce the figures is available at https://github.com/dosorio/mtProportion Contact dcosorioh@tamu.edu Supplementary information Supplementary data are available at Bioinformatics online.

[1]  J. Lee,et al.  Single-cell RNA sequencing technologies and bioinformatics pipelines , 2018, Experimental & Molecular Medicine.

[2]  Gennady Korotkevich,et al.  Fast gene set enrichment analysis , 2021 .

[3]  Mark D. Robinson,et al.  pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools , 2020, Genome Biology.

[4]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[5]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[6]  A. Ekici,et al.  Single-cell RNA sequencing of adult mouse testes , 2018, Scientific Data.

[7]  Lior Pachter,et al.  A curated database reveals trends in single-cell transcriptomics , 2019, bioRxiv.

[8]  M. Ryan,et al.  A mitochondrial specific stress response in mammalian cells , 2002, The EMBO journal.

[9]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[10]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[11]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[12]  Tim R. Mercer,et al.  The Human Mitochondrial Transcriptome , 2011, Cell.

[13]  Begoña Aguado,et al.  Global variability in gene expression and alternative splicing is modulated by mitochondrial content , 2015, Genome research.

[14]  R. Sandberg Entering the era of single-cell transcriptomics in biology and medicine , 2013, Nature Methods.

[15]  J. Poulton,et al.  Mitochondrial content is central to nuclear gene expression: Profound implications for human health , 2016, BioEssays : news and reviews in molecular, cellular and developmental biology.

[16]  R. Sadreyev,et al.  Single‐Cell RNA‐seq: Introduction to Bioinformatics Analysis , 2019, Current protocols in molecular biology.

[17]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[18]  V. Beneš,et al.  Apoptotic Cell Exclusion and Bias‐Free Single‐Cell Selection Are Important Quality Control Requirements for Successful Single‐Cell Sequencing Applications , 2020, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[19]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[20]  Fei Wang,et al.  EnsembleKQC: An Unsupervised Ensemble Learning Method for Quality Control of Single Cell RNA-seq Sequencing Data , 2019, ICIC.

[21]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[22]  Oscar Franzén,et al.  PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data , 2019, Database J. Biol. Databases Curation.

[23]  Mark Danielsen,et al.  An Introduction to the Analysis of Single-Cell RNA-Sequencing Data , 2018, Molecular therapy. Methods & clinical development.

[24]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.