A Personalized Low-Rank Subspace Clustering Method Based on Locality and Similarity Constraints for scRNA-seq Data Analysis

Single-cell RNA sequencing (scRNA-seq) technology can provide expression profile of single cells, which propels biological research into a new chapter. Clustering individual cells based on their transcriptome is a critical objective of scRNA-seq data analysis. However, the high-dimensional, sparse and noisy nature of scRNA-seq data pose a challenge to single-cell clustering. Therefore, it is urgent to develop a clustering method targeting scRNA-seq data characteristics. Due to its powerful subspace learning capability and robustness to noise, the subspace segmentation method based on low-rank representation (LRR) is broadly used in clustering researches and achieves satisfactory results. In view of this, we propose a personalized low-rank subspace clustering method, namely PLRLS, to learn more accurate subspace structures from both global and local perspectives. Specifically, we first introduce the local structure constraint to capture the local structure information of the data, while helping our method to obtain better inter-cluster separability and intra-cluster compactness. Then, in order to retain the important similarity information that is ignored by the LRR model, we utilize the fractional function to extract similarity information between cells, and introduce this information as the similarity constraint into the LRR framework. The fractional function is an efficient similarity measure designed for scRNA-seq data, which has theoretical and practical implications. In the end, based on the LRR matrix learned from PLRLS, we perform downstream analyses on real scRNA-seq datasets, including spectral clustering, visualization and marker gene identification. Comparative experiments show that the proposed method achieves superior clustering accuracy and robustness.

[1]  Dehe Wang,et al.  Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity , 2022, Nature Communications.

[2]  C. Li,et al.  Explainable t-SNE for single-cell RNA-seq data analysis , 2022, bioRxiv.

[3]  Jin-Xing Liu,et al.  Adaptive Total-Variation Regularized Low-Rank Representation for Analyzing Single-Cell RNA-seq Data , 2021, Interdisciplinary Sciences: Computational Life Sciences.

[4]  Wei Zhang,et al.  SCCLRR: A Robust Computational Method for Accurate Clustering Single Cell RNA-Seq Data , 2020, IEEE Journal of Biomedical and Health Informatics.

[5]  Miin-Shen Yang,et al.  Unsupervised K-Means Clustering Algorithm , 2020, IEEE Access.

[6]  Shibiao Wan,et al.  SHARP: hyperfast and accurate processing of single-cell RNA-seq data via ensemble random projection , 2020, Genome research.

[7]  Yi Pan,et al.  SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation , 2019, Bioinform..

[8]  David Watson,et al.  Spectrum: fast density-aware spectral clustering for single and multi-omic data , 2019, bioRxiv.

[9]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[10]  Hao Jiang,et al.  Single cell clustering based on cell‐pair differentiability correlation and variance analysis , 2018, Bioinform..

[11]  Zhe Chen,et al.  Robust Low-Rank Recovery with a Distance-Measure Structure for Face Recognition , 2018, PRICAI.

[12]  Seyoung Park,et al.  Spectral clustering based on learning similarity matrix , 2018, Bioinform..

[13]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[14]  R. Satija,et al.  Single-cell RNA sequencing to explore immune cell heterogeneity , 2017, Nature Reviews Immunology.

[15]  Sandrine Dudoit,et al.  Normalizing single-cell RNA sequencing data: challenges and opportunities , 2017, Nature Methods.

[16]  Mauricio Barahona,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[17]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[18]  D. Mock,et al.  Innate-like functions of natural killer T cell subsets result from highly divergent gene programs , 2016, Nature Immunology.

[19]  J. Marioni,et al.  Heterogeneity in Oct4 and Sox2 Targets Biases Cell Fate in 4-Cell Mouse Embryos , 2016, Cell.

[20]  Christoph Bock,et al.  Single‐cell transcriptomes reveal characteristic features of human pancreatic islet cell types , 2015, EMBO reports.

[21]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[22]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[23]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[24]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[25]  Ben S. Wittner,et al.  Single-Cell RNA Sequencing Identifies Extracellular Matrix Gene Expression by Pancreatic Circulating Tumor Cells , 2014, Cell reports.

[26]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[27]  Alexey L. Pomerantsev Principal Component Analysis (PCA) , 2014, Encyclopedia of Autism Spectrum Disorders.

[28]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[29]  Jie Zhang,et al.  Structure-Constrained Low-Rank Representation , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[30]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[31]  F. Zhang,et al.  Long non-coding RNA MALAT-1 overexpression predicts tumor recurrence of hepatocellular carcinoma after liver transplantation , 2012, Medical Oncology.

[32]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[33]  Shuicheng Yan,et al.  Latent Low-Rank Representation for subspace segmentation and feature extraction , 2011, 2011 International Conference on Computer Vision.

[34]  K. Miyazono,et al.  O34. TGF-β signaling maintains tumorigenicity of glioma-initiating cells through the Sox4–Sox2 axis , 2010 .

[35]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Catalin C. Barbacioru,et al.  RNA-Seq analysis to capture the transcriptome landscape of a single cell , 2010, Nature Protocols.

[37]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[38]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[39]  D. Keays,et al.  Large spectrum of lissencephaly and pachygyria phenotypes resulting from de novo missense mutations in tubulin alpha 1A (TUBA1A) , 2007, Human mutation.

[40]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[41]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[42]  M. Hediger,et al.  Distribution of the glutamate transporters GLT-1 (SLC1A2) and GLAST (SLC1A3) in peripheral organs , 2006, Anatomy and Embryology.

[43]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[44]  Gerald M. Rubin,et al.  Drosophila Fragile X-Related Gene Regulates the MAP1B Homolog Futsch to Control Synaptic Structure and Function , 2001, Cell.

[45]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[46]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[47]  E. Takahashi,et al.  Small weak acids stimulate proton transfer events in site‐directed mutants of the two ionizable residues, GluL212 and AspL213, in the QB‐binding site of Rhodobacter sphaeroides reaction center , 1991, FEBS letters.

[48]  Staci A. Sorensen,et al.  Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics , 2016 .

[49]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[50]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[51]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.