NDRindex: A method for the quality assessment of single-cell RNA-Seq preprocessing data

Background: Single-cell RNA sequencing can be used to determine cell types in an unbiased way. Normally, the analysis pipeline of single-cell RNA data includes data n ormalization, dimension reduction and unsupervised clustering. However, different normalization and dimension reduction methods will influence the results of clustering and cell type enrichment analysis significantly. Choices of preprocessing paths is crucial in scRNA-Seq data mining because an appropriate preprocessing path can extract more important information from complex raw data and lead to a more accurate clustering result. Results: We propose a method called NDRindex(Normalization and Dimensionality Reduction index) to evaluate single-cell RNA-seq data quality. The method includes a function that calculates the degree of aggregation of data, which is the key to benchmarking data quality before clustering. For five single-cell RNA sequencing data sets we tested, the result shows the effectiveness and the accuracy of our index. Conclusions: This method we introduce focuses on filling the blanks in the selection of preprocessing paths and the result proves its effectiveness and accuracy. Our study provides a useful indicator for RNA-Seq data assessment.

[1]  Sanghamitra Bandyopadhyay,et al.  Structure-Aware Principal Component Analysis for Single-Cell RNA-seq Data , 2018, J. Comput. Biol..

[2]  Juan Liu,et al.  Edge‐group sparse PCA for network‐guided high dimensional data analysis , 2018, Bioinform..

[3]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, BMC Bioinformatics.

[4]  Marcel H. Schulz,et al.  In silico read normalization using set multi-cover optimization , 2017, Bioinform..

[5]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[6]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[7]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[8]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  S. Richardson,et al.  Beyond comparisons of means: understanding changes in gene expression at the single-cell level , 2016, Genome Biology.

[12]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[13]  Pak Chung Sham,et al.  Linnorm: improved statistical analysis for single cell RNA-seq expression data , 2017, Nucleic acids research.

[14]  Guoshuai Cai,et al.  Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data , 2018, Bioinform..

[15]  M. Newton,et al.  SCnorm: robust normalization of single-cell RNA-seq data , 2017, Nature Methods.

[16]  Shintaro Katayama,et al.  SAMstrt: statistical test for differential expression in single-cell transcriptome with spike-in normalization , 2013, Bioinform..

[17]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[18]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .