Optimal Gene Filtering for Single-Cell data (OGFSC) - a gene filtering algorithm for single-cell RNA-seq data

Motivation Single-cell transcriptomic data are commonly accompanied by extremely high technical noise due to the low RNA concentrations from individual cells. Precise identification of differentially expressed genes and cell populations are heavily dependent on the effective reduction of technical noise, for example, by gene filtering. However, there is still no well-established standard in the current approaches of gene filtering. Investigators usually filter out genes based on single fixed threshold, which commonly leads to both over- and under-stringent errors. Results In this study, we propose a novel algorithm, termed as OGFSC (Optimal Gene Filtering for Single-Cell data), to construct a thresholding curve based on gene expression levels and the corresponding variances. We validated our method on multiple single-cell RNA-seq datasets, including simulated and published experimental datasets. The results show that the known signal and known noise are reliably discriminated in the simulated datasets. In addition, the results of seven experimental datasets demonstrate that these cells of the same annotated types are more sharply clustered using our method. Interestingly, when we re-analyze the dataset from an aging research recently published in Science, we find a list of regulated genes which is different from that reported in the original study, because of using different filtering methods. However, the knowledge based on our findings better matches the progression of immunosenescence. In summary, we here provide an alternative opportunity to probe into the true level of technical noise in single-cell transcriptomic data. Availability https://github.com/XZouProjects/OGFSC.git. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[2]  J. Marioni,et al.  Heterogeneity in Oct4 and Sox2 Targets Biases Cell Fate in 4-Cell Mouse Embryos , 2016, Cell.

[3]  Shawn M. Gillespie,et al.  Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer , 2017, Cell.

[4]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[5]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[6]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[7]  Ben S. Wittner,et al.  Single-Cell RNA Sequencing Identifies Extracellular Matrix Gene Expression by Pancreatic Circulating Tumor Cells , 2014, Cell reports.

[8]  M. Elowitz,et al.  Challenges and emerging directions in single-cell analysis , 2017, Genome Biology.

[9]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[10]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[11]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[12]  Sarah A. Teichmann,et al.  Aging increases cell-to-cell transcriptional variability upon immune stimulation , 2017, Science.

[13]  Neil D. Lawrence,et al.  Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies , 2012, PLoS Comput. Biol..

[14]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[15]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[16]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[17]  Alex A. Pollen,et al.  Single‐cell sequencing maps gene expression to mutational phylogenies in PDGF‐ and EGF‐driven gliomas , 2016, Molecular Systems Biology.

[18]  Bo Ding,et al.  Normalization and noise reduction for single cell RNA-seq experiments , 2015, Bioinform..

[19]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[20]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[21]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[22]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[23]  S. Richardson,et al.  Beyond comparisons of means: understanding changes in gene expression at the single-cell level , 2016, Genome Biology.

[24]  R. Sandberg,et al.  Single-cell transcriptomics uncovers distinct molecular signatures of stem cells in chronic myeloid leukemia , 2017, Nature Medicine.

[25]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.