Identifying cell types from single-cell data based on similarities and dissimilarities between cells

Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..

[3]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[4]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[5]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[6]  Thomas Höfer,et al.  Robust classification of single-cell transcriptome data by nonnegative matrix factorization , 2017, Bioinform..

[7]  Michael Poidinger,et al.  Identification of cDC1- and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow , 2015, Nature Immunology.

[8]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[9]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[10]  Fang-Xiang Wu,et al.  Dynamic Model-based Clustering for Time-course Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Li-Ping Tian,et al.  CASNMF: A Converged Algorithm for symmetrical nonnegative matrix factorization , 2018, Neurocomputing.

[13]  Shuicheng Yan,et al.  Convex Sparse Spectral Clustering: Single-View to Multi-View , 2015, IEEE Transactions on Image Processing.

[14]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[15]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[16]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[17]  Rudiyanto Gunawan,et al.  CALISTA: Clustering and LINEAGE Inference in Single-Cell Transcriptional Analysis , 2018, bioRxiv.

[18]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[19]  Seyoung Park,et al.  Spectral clustering based on learning similarity matrix , 2018, Bioinform..

[20]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[21]  Shaoqiang Zhang,et al.  Consensus clustering of single-cell RNA-seq data by enhancing network affinity , 2021, Briefings Bioinform..

[22]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[23]  Yi Pan,et al.  SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation , 2019, Bioinform..

[24]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[25]  Hao Jiang,et al.  Single cell clustering based on cell‐pair differentiability correlation and variance analysis , 2018, Bioinform..

[26]  Lin Wu,et al.  A Fast Algorithm for Nonnegative Matrix Factorization and Its Convergence , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[28]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[29]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[30]  Xiaoshu Zhu,et al.  A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data , 2019, Genes.

[31]  Jialong Liang,et al.  Single-cell sequencing technologies: current and future. , 2014, Journal of genetics and genomics = Yi chuan xue bao.