Clustering and classification methods for single-cell RNA-sequencing data

Appropriate ways to measure the similarity between single-cell RNA-sequencing (scRNA-seq) data are ubiquitous in bioinformatics, but using single clustering or classification methods to process scRNA-seq data is generally difficult. This has led to the emergence of integrated methods and tools that aim to automatically process specific problems associated with scRNA-seq data. These approaches have attracted a lot of interest in bioinformatics and related fields. In this paper, we systematically review the integrated methods and tools, highlighting the pros and cons of each approach. We not only pay particular attention to clustering and classification methods but also discuss methods that have emerged recently as powerful alternatives, including nonlinear and linear methods and descending dimension methods. Finally, we focus on clustering and classification methods for scRNA-seq data, in particular, integrated methods, and provide a comprehensive description of scRNA-seq data and download URLs.

[1]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[2]  F. Biase,et al.  Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing , 2014, Genome research.

[3]  Thomas Höfer,et al.  Robust classification of single-cell transcriptome data by nonnegative matrix factorization , 2017, Bioinform..

[4]  Ben S. Wittner,et al.  Single-Cell RNA Sequencing Identifies Extracellular Matrix Gene Expression by Pancreatic Circulating Tumor Cells , 2014, Cell reports.

[5]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[6]  Xiaoyu Wang,et al.  Combining Gene Ontology with Deep Neural Networks to Enhance the Clustering of Single Cell RNA-Seq Data , 2018 .

[7]  Haiyan Huang,et al.  Identifying Cell Subpopulations and Their Genetic Drivers from Single-Cell RNA-Seq Data Using a Biclustering Approach , 2017, J. Comput. Biol..

[8]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[9]  Jason H. Moore,et al.  Influence networks based on coexpression improve drug target discovery for the development of novel cancer therapeutics , 2014, BMC Systems Biology.

[10]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  N. Navin,et al.  Clonal Evolution in Breast Cancer Revealed by Single Nucleus Genome Sequencing , 2014, Nature.

[13]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[14]  Lihua Zhang,et al.  Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Xiu-Feng Wan,et al.  Quartet-net: a quartet-based method to reconstruct phylogenetic networks. , 2013, Molecular biology and evolution.

[16]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[17]  Erik Splinter,et al.  Dynamics of gene silencing during X inactivation using allele-specific RNA-seq , 2015, Genome Biology.

[18]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[19]  Tomer Kalisky,et al.  A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors , 2018, Bioinform..

[20]  Xiaoqi Zheng,et al.  A systematic study on drug-response associated genes using baseline gene expressions of the Cancer Cell Line Encyclopedia , 2016, Scientific Reports.

[21]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[22]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[23]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[24]  Jialiang Yang,et al.  Run Probabilities of Seed-Like Patterns and Identifying Good Transition Seeds , 2008, J. Comput. Biol..

[25]  Myles Brown,et al.  A Bayesian model for single cell transcript expression analysis on MERFISH data , 2018, Bioinform..

[26]  Q. Zou,et al.  Similarity computation strategies in the microRNA-disease network: a survey. , 2015, Briefings in functional genomics.

[27]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[28]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[29]  Stephen R. Quake,et al.  Genome-wide Single-Cell Analysis of Recombination Activity and De Novo Mutation Rates in Human Sperm , 2012, Cell.

[30]  Stefan Grünewald,et al.  Quartet-based methods to reconstruct phylogenetic networks , 2014, BMC Systems Biology.

[31]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[32]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[33]  Qin Ma,et al.  IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis , 2019, PLoS Comput. Biol..

[34]  Suvrit Sra,et al.  Geometric Mean Metric Learning , 2016, ICML.

[35]  Andrew B. Nobel,et al.  Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases , 2015, Scientific Reports.

[36]  N. Navin,et al.  Tumor evolution in response to chemotherapy: phenotype versus genotype. , 2014, Cell reports.

[37]  Jean-Loup Guillaume,et al.  Fast unfolding of community hierarchies in large networks , 2008, ArXiv.

[38]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[39]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[40]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[41]  Xiangxiang Zeng,et al.  Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks , 2016, Briefings Bioinform..

[42]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[43]  J. Marioni,et al.  Heterogeneity in Oct4 and Sox2 Targets Biases Cell Fate in 4-Cell Mouse Embryos , 2016, Cell.

[44]  R. Tibshirani,et al.  Normalization, testing, and false discovery rate estimation for RNA-sequencing data. , 2012, Biostatistics.

[45]  Minzhe Guo,et al.  Single-Cell Transcriptome Analysis Using SINCERA Pipeline. , 2018, Methods in molecular biology.

[46]  Xiaofeng Liu,et al.  Developing a Multi-Dose Computational Model for Drug-Induced Hepatotoxicity Prediction Based on Toxicogenomics Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[48]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[49]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[50]  I. Ross,et al.  Transcription of individual genes in eukaryotic cells occurs randomly and infrequently , 1994, Immunology and cell biology.

[51]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[52]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[53]  Ertugrul M. Ozbudak,et al.  Regulation of noise in the expression of a single gene , 2002, Nature Genetics.

[54]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[55]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[56]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[57]  Gioele La Manno,et al.  Quantitative single-cell RNA-seq with unique molecular identifiers , 2013, Nature Methods.

[58]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[59]  Michael Q. Zhang,et al.  Network embedding-based representation learning for single cell RNA-seq data , 2017, Nucleic acids research.

[60]  Chi Zhang,et al.  QUBIC2: A novel biclustering algorithm for large-scale bulk RNA-sequencing and single-cell RNA-sequencing data analysis , 2018, bioRxiv.

[61]  J. Eberwine,et al.  Analysis of gene expression in single live neurons. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[62]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[63]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[64]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[65]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[66]  X. Xie,et al.  Genome-Wide Detection of Single-Nucleotide and Copy-Number Variations of a Single Human Cell , 2012, Science.

[67]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[68]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[69]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[70]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[71]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[72]  Scott A. Rifkin,et al.  Imaging individual mRNA molecules using multiple singly labeled probes , 2008, Nature Methods.

[73]  Guojun Liu,et al.  Identify bilayer modules via pseudo-3D clustering: applications to miRNA-gene bilayer networks , 2016, Nucleic acids research.

[74]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[75]  Xiaobo Zhou,et al.  Applications of Single-Cell Sequencing for Multiomics. , 2018, Methods in molecular biology.

[76]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[77]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[78]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, BMC Bioinformatics.

[79]  Jijun Tang,et al.  Analysis of Co-Associated Transcription Factors via Ordered Adjacency Differences on Motif Distribution , 2017, Scientific Reports.

[80]  L. J. P. van der Maaten,et al.  An Introduction to Dimensionality Reduction Using Matlab , 2007 .

[81]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[82]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[83]  Hao Jiang,et al.  Single cell clustering based on cell‐pair differentiability correlation and variance analysis , 2018, Bioinform..

[84]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[85]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[86]  Robin L. Jones,et al.  Inference of tumor evolution during chemotherapy by computational modeling and in situ analysis of genetic and phenotypic cellular diversity. , 2014, Cell reports.

[87]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[88]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[89]  Wei Lin,et al.  A comprehensive overview and evaluation of circular RNA detection tools , 2017, PLoS Comput. Biol..