Locally Manifold Non-negative Matrix Factorization Based on Centroid for scRNA-seq Data Analysis

The rapid development of single cell RNA sequencing (scRNA-seq) has made it possible to study the association between cells and genes at molecular resolution. When the follow-up analysis is carried out, it is often difficult to extract the cell information in high-dimensional space because of the high gene dimension in single-cell sequencing, which leads to inaccurate results in the follow-up analysis. To solve the problem, we propose a method called locally manifold non-negative matrix factorization based on centroid for scRNA-seq data analysis (MNMFC). MNMFC is a similarity modeling scheme based on locally manifold, which can map cell association in high dimensional space. Through similarity learning based on locally manifold and non-negative matrix decomposition (NMF) algorithm, the data in high-dimensional space can be mapped to low-dimensional space, which provides help for downstream clustering analysis. The performance of the model was validated experimentally on 10 scRNA-seq datasets. Compared with other nine advanced single-cell clustering methods, whether it is a comprehensive analysis or an individual analysis of the dataset, MNMFC has achieved encouraging results.

[1]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[2]  James Bailey,et al.  Adjusting for Chance Clustering Comparison Measures , 2015, J. Mach. Learn. Res..

[3]  Tianrui Li,et al.  Nonnegative matrix factorization for clustering ensemble based on dark knowledge , 2019, Knowl. Based Syst..

[4]  Hao Jiang,et al.  Single cell clustering based on cell‐pair differentiability correlation and variance analysis , 2018, Bioinform..

[5]  Yi Pan,et al.  SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation , 2019, Bioinform..

[6]  Charlotte Soneson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data , 2018, F1000Research.

[7]  Wei Xing Zheng,et al.  Distributed $k$ -Means Algorithm and Fuzzy $c$ -Means Algorithm for Sensor Networks Based on Multiagent Consensus Theory , 2017, IEEE Transactions on Cybernetics.

[8]  Wei Zhang,et al.  SCCLRR: A Robust Computational Method for Accurate Clustering Single Cell RNA-Seq Data , 2020, IEEE Journal of Biomedical and Health Informatics.

[9]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[10]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[11]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[12]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[13]  Yun Fu,et al.  Entropy‐based consensus clustering for patient stratification , 2017, Bioinform..

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  Shuicheng Yan,et al.  Convex Sparse Spectral Clustering: Single-View to Multi-View , 2015, IEEE Transactions on Image Processing.

[16]  Jiguo Yu,et al.  Regularized Non-Negative Matrix Factorization for Identifying Differentially Expressed Genes and Clustering Samples: A Survey , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  D. Mock,et al.  Innate-like functions of natural killer T cell subsets result from highly divergent gene programs , 2016, Nature Immunology.

[18]  Sarah A. Teichmann,et al.  Computational approaches for interpreting scRNA‐seq data , 2017, FEBS letters.

[19]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[20]  Aaron M. Streets,et al.  Microfluidic single-cell whole-transcriptome sequencing , 2014, Proceedings of the National Academy of Sciences.

[21]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[22]  Sandy L. Klemm,et al.  Single-Cell Expression Analyses during Cellular Reprogramming Reveal an Early Stochastic and a Late Hierarchic Phase , 2012, Cell.

[23]  Staci A. Sorensen,et al.  Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics , 2016 .

[24]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[25]  J. Marioni,et al.  Heterogeneity in Oct4 and Sox2 Targets Biases Cell Fate in 4-Cell Mouse Embryos , 2016, Cell.

[26]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[27]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.