A Global Similarity Learning for Clustering of Single-Cell RNA-Seq Data

Single-cell RNA-seq (scRNA-seq) data analysis is a powerful tool for biological researches. Similarity plays an important role in clustering scRNA-seq data. Existing similarity measurements are mainly based on local distance information that is calculated between directly connected node pairs, or shared nearest neighbours' information, without considering the global information. Therefore, these similarity measurements may be not very accurate based on the insufficient information. Based on multi-kernel indices in a global feature space and path-based similarity, we proposed a new similarity measurement for single-cell clustering, called multi-kernel and path-based global similarity (MPGS). In MPGS, global information was incorporated by a new feature space from Spearman correlation coefficient, and a global similarity matrix calculated by multi-kernel. A path-based similarity metric was designed to expand the relevant node range. Based on this similaritiy, a modified Louvain community detection method was applied to cluster the scRNA-seq data, named MPGS-Louvain. To validate the performance of MPGS, the clustering performances of several clustering methods combined with different similarity measurements were compared. To demonstrate the performance of MPGS-Louvain, we compared MPGS-Louvain and five scRNA-seq clustering methods on twenty scRNA-seq datasets. The experimental results showed that MPGS outperformed other similarity measurements, and MPGS-Louvain achieved better performance on these datasets. It can be observed that MPGS provided a new insight to improve the accuracy of clustering scRNA-seq data by considering the global information in similarity measurement. MPGS-Louvain automatically detected clusters accurately without prior knowledge.

[1]  Yi Pan,et al.  A Gene Rank Based Approach for Single Cell Similarity Assessment and Clustering , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Boudewijn P F Lelieveldt,et al.  Data-driven identification of prognostic tumor subpopulations using spatially mapped t-SNE of mass spectrometry imaging data , 2016, Proceedings of the National Academy of Sciences.

[3]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[4]  Thomas Höfer,et al.  Robust classification of single-cell transcriptome data by nonnegative matrix factorization , 2017, Bioinform..

[5]  Feng Luo,et al.  DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning , 2018, bioRxiv.

[6]  Yi Pan,et al.  Protein-protein interactions: detection, reliability assessment and applications , 2016, Briefings Bioinform..

[7]  J. Marioni,et al.  Heterogeneity in Oct4 and Sox2 Targets Biases Cell Fate in 4-Cell Mouse Embryos , 2016, Cell.

[8]  Linyuan Lü,et al.  Similarity index based on local paths for link prediction of complex networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[10]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[11]  F. Biase,et al.  Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing , 2014, Genome research.

[12]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[13]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[14]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[15]  René Vidal,et al.  Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework , 2016, IEEE Transactions on Image Processing.

[16]  Giovanni Iacono,et al.  Single-Cell Transcriptomics Unveils Gene Regulatory Network Plasticity , 2018 .

[17]  Gene W. Yeo,et al.  Single-Cell Alternative Splicing Analysis with Expedition Reveals Splicing Dynamics during Neuron Differentiation. , 2017, Molecular cell.

[18]  Yi Shen,et al.  Short communication A measure of centrality based on modularity matrix , 2008 .

[19]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[20]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[21]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[22]  Yaohang Li,et al.  Drug repositioning based on bounded nuclear norm regularization , 2019, Bioinform..

[23]  D. Mock,et al.  Innate-like functions of natural killer T cell subsets result from highly divergent gene programs , 2016, Nature Immunology.

[24]  David W. Nauen,et al.  Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis. , 2015, Cell stem cell.

[25]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[26]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[27]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[28]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[29]  Joel Dudley,et al.  Automated cell type discovery and classification through knowledge transfer , 2017, Bioinform..

[30]  H. Ueda,et al.  Erratum to: Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity , 2017, Genome Biology.

[31]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[32]  Enric Llorens-Bobadilla,et al.  Single-Cell Transcriptomics Reveals a Population of Dormant Neural Stem Cells that Become Activated upon Brain Injury. , 2015, Cell stem cell.

[33]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[34]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[35]  Ben S. Wittner,et al.  Single-Cell RNA Sequencing Identifies Extracellular Matrix Gene Expression by Pancreatic Circulating Tumor Cells , 2014, Cell reports.

[36]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Yi Pan,et al.  Classification of autism spectrum disorder by combining brain connectivity and deep neural network classifier , 2019, Neurocomputing.

[38]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[39]  I. Macaulay,et al.  Single-cell RNA sequencing reveals molecular and functional platelet bias of aged haematopoietic stem cells , 2016, Nature Communications.

[40]  W. Liu,et al.  Identification of key factors conquering developmental arrest of somatic cell cloned embryos by combining embryo biopsy and single-cell sequencing , 2016, Cell Discovery.

[41]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[42]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[43]  Paul C. Blainey,et al.  A microfluidic platform enabling single-cell RNA-seq of multigenerational lineages , 2016, Nature Communications.

[44]  Hongjin Wu,et al.  Single-Cell Sequencing for Drug Discovery and Drug Development. , 2017, Current topics in medicinal chemistry.

[45]  O. Stegle,et al.  Single-cell epigenomics: Recording the past and predicting the future , 2017, Science.

[46]  Yi Pan,et al.  BiXGBoost: a scalable, flexible boosting-based method for reconstructing gene regulatory networks , 2018, Bioinform..

[47]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[48]  Guo-Cheng Yuan,et al.  Single-Cell Analysis in Cancer Genomics. , 2015, Trends in genetics : TIG.

[49]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[50]  Yuanfang Guan,et al.  BaiHui: cross-species brain-specific network built with hundreds of hand-curated datasets , 2018, Bioinform..

[51]  Xiaoshu Zhu,et al.  A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data , 2019, Genes.

[52]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[53]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .