CNLLRR: A Novel Low-Rank Representation Method for Single-cell RNA-seq Data Analysis

The development of single-cell RNA-sequencing (scRNA-seq) technology has enabled the measurement of gene expression in individual cells. This provides an unprecedented opportunity to explore the biological mechanisms at the cellular level. However, existing scRNA-seq analysis methods are susceptible to noise and outliers or ignore the manifold structure inherent in the data. In this paper, a novel method called Cauchy non-negative Laplacian regularized low-rank representation (CNLLRR) is proposed to alleviate the above problem. Specifically, we employ the Cauchy loss function (CLF) instead of the conventional norm constraints in the noise matrix of CNLLRR, which will enhance the robustness of the method. In addition, graph regularization term is applied to the objective function, which can capture the paired geometric relationships between cells. Then, alternating direction method of multipliers (ADMM) is adopted to solve the optimization problem of CNLLRR. Finally, extensive experiments on scRNA-seq data reveal that the proposed CNLLRR method outperforms other state-of-the-art methods for cell clustering, cell visualization and prioritization of gene markers. CNLLRR contributes to understand the heterogeneity between cell populations in complex biological systems. Author summary Analysis of single-cell data can help to further study the heterogeneity and complexity of cell populations. The current analysis methods are mainly to learn the similarity between cells and cells. Then they use the clustering algorithm to perform cell clustering or downstream analysis on the obtained similarity matrix. Therefore, constructing accurate cell-to-cell similarity is crucial for single-cell data analysis. In this paper, we design a novel Cauchy non-negative Laplacian regularized low-rank representation (CNLLRR) method to get a better similarity matrix. Specifically, Cauchy loss function (CLF) constraint is applied to punish noise matrix, which will improve the robustness of CNLLRR to noise and outliers. Moreover, graph regularization term is applied to the objective function, which will effectively encode the local manifold information of the data. Further, these will guarantee the quality of the cell-to-cell similarity matrix learned. Finally, single-cell data analysis experiments show that our method is superior to other representative methods.

[1]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jean Yee Hwa Yang,et al.  Impact of similarity metrics on single-cell RNA-seq data clustering , 2018, Briefings Bioinform..

[3]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[4]  H. Abdi,et al.  Principal component analysis , 2010 .

[5]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[6]  Hideyuki Okano,et al.  Involvement of ER Stress in Dysmyelination of Pelizaeus-Merzbacher Disease with PLP1 Missense Mutations Shown by iPSC-Derived Oligodendrocytes , 2014, Stem cell reports.

[7]  Francesco Bruni,et al.  REXO2 Is an Oligoribonuclease Active in Human Mitochondria , 2013, PloS one.

[8]  Xuelong Li,et al.  Robust Subspace Clustering by Cauchy Loss Function , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[9]  E. Shapiro,et al.  Single-cell sequencing-based technologies will revolutionize whole-organism science , 2013, Nature Reviews Genetics.

[10]  Ben S. Wittner,et al.  Single-Cell RNA Sequencing Identifies Extracellular Matrix Gene Expression by Pancreatic Circulating Tumor Cells , 2014, Cell reports.

[11]  Seyoung Park,et al.  Spectral clustering based on learning similarity matrix , 2018, Bioinform..

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[14]  Shaojie Qiao,et al.  Non-Negative Matrix Factorization With Locality Constrained Adaptive Graph , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[16]  Tomer Kalisky,et al.  A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors , 2018, Bioinform..

[17]  J. Marioni,et al.  Heterogeneity in Oct4 and Sox2 Targets Biases Cell Fate in 4-Cell Mouse Embryos , 2016, Cell.

[18]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[19]  Xuelong Li,et al.  Subspace clustering guided convex nonnegative matrix factorization , 2018, Neurocomputing.

[20]  P. Nordmann,et al.  Association of chromatin proteins high mobility group box (HMGB) 1 and HMGB2 with mitotic chromosomes. , 2003, Molecular biology of the cell.

[21]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[22]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[23]  Hao Jiang,et al.  Single cell clustering based on cell‐pair differentiability correlation and variance analysis , 2018, Bioinform..

[24]  I. Macaulay,et al.  Single-cell RNA sequencing reveals molecular and functional platelet bias of aged haematopoietic stem cells , 2016, Nature Communications.

[25]  Wen Gao,et al.  Progressive Image Denoising Through Hybrid Graph Laplacian Regularization: A Unified Framework , 2014, IEEE Transactions on Image Processing.

[26]  Hau-San Wong,et al.  Generalized Adjusted Rand Indices for cluster ensembles , 2012, Pattern Recognit..

[27]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[28]  Yi Pan,et al.  SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation , 2019, Bioinform..

[29]  Shasha Yuan,et al.  A novel low-rank representation method for identifying differentially expressed genes , 2017, Int. J. Data Min. Bioinform..

[30]  Jie Qiao,et al.  A single-cell RNA-seq survey of the developmental landscape of the human prefrontal cortex , 2018, Nature.

[31]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[32]  C. Sutton,et al.  Peptide mass fingerprinting of chaperonin‐containing TCP‐1 (CCT) and copurifying proteins , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[33]  D. Mock,et al.  Innate-like functions of natural killer T cell subsets result from highly divergent gene programs , 2016, Nature Immunology.

[34]  Thomas Höfer,et al.  Robust classification of single-cell transcriptome data by nonnegative matrix factorization , 2017, Bioinform..

[35]  René Vidal,et al.  Low rank subspace clustering (LRSC) , 2014, Pattern Recognit. Lett..

[36]  Xiaofeng Wang,et al.  Robust Subspace Segmentation by Self-Representation Constrained Low-Rank Representation , 2018, Neural Processing Letters.

[37]  Chun-Hou Zheng,et al.  Differentially expressed genes selection via Laplacian regularized low-rank representation method , 2016, Comput. Biol. Chem..

[38]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[39]  Xuelong Li,et al.  Refined-Graph Regularization-Based Nonnegative Matrix Factorization , 2017, ACM Trans. Intell. Syst. Technol..

[40]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[42]  Junbin Gao,et al.  Laplacian Regularized Low-Rank Representation and Its Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Renan Valieris,et al.  Human dendritic cells (DCs) are derived from distinct circulating precursors that are precommitted to become CD1c+ or CD141+ DCs , 2016, The Journal of experimental medicine.

[44]  G. Durif,et al.  Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis , 2017, RECOMB 2018.

[45]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[46]  Larry S. Davis,et al.  Truncated Cauchy Non-Negative Matrix Factorization , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[48]  James Bailey,et al.  Adjusting for Chance Clustering Comparison Measures , 2015, J. Mach. Learn. Res..