Visualizing High-Dimensional Single-Cell RNA-seq Data via Random Projections and Geodesic Distances

The recent advent in Next Generation Sequencing has created a huge data source which offers a great potential for elucidating complex disease mechanisms and biological processes. A recent technology is the single-cell RNA sequencing, which allows transcriptomics measurements in individual cells, having promising results. However, such studies measure the entire genome for thousands of cells, creating datasets with extremely high dimensionality and complexity. Following this perspective, we propose a dimensionality reduction approach, called RGt-SNE, which visualizes single-cell RNA-seq data in two dimensions. Initially, RGt-SNE defines a cell-cell distance matrix based on Random Projections and Geodesic Distances. The first is used to define the pairwise cells distances in a low dimensional projected space avoiding the difficulties that exist in data with ultra-high dimensionality. The latter is used to better define the large pairwise cells distances. Subsequently, the t-SNE method is applied in the customized distance matrix for two dimensional visualization. RGt-SNE was evaluated in two real experimental single-cell RNA-seq data against three well-known methods, such as t-SNE, Multidimensional scaling, and ISOMAP. Outcomes provide the superiority of RGt-SNE suggesting it as a reliable tool for single-cell RNA-seq data analysis and visualization.

[1]  Caleb Weinreb,et al.  SPRING: a kinetic interface for visualizing high dimensional single-cell expression data , 2017, bioRxiv.

[2]  M. Snyder,et al.  High-throughput sequencing technologies. , 2015, Molecular cell.

[3]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017 .

[4]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[5]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[6]  Pablo Tamayo,et al.  Visualizing and interpreting single-cell gene expression datasets with Similarity Weighted Nonnegative Embedding , 2018, bioRxiv.

[7]  Philipp Berens,et al.  The art of using t-SNE for single-cell transcriptomics , 2019, Nature Communications.

[8]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[9]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[10]  Pablo G. Camara,et al.  Methods and challenges in the analysis of single-cell RNA-sequencing data , 2018 .

[11]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[12]  Pavithra Kumar,et al.  Understanding development and stem cells using single cell-based analyses of gene expression , 2017, Development.

[13]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[14]  Valerio Pascucci,et al.  Visualizing High-Dimensional Data: Advances in the Past Decade , 2017, IEEE Transactions on Visualization and Computer Graphics.

[15]  Junjie Zhu,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, bioRxiv.

[16]  Sarah A. Teichmann,et al.  Computational approaches for interpreting scRNA‐seq data , 2017, FEBS letters.

[17]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[18]  Sotiris K. Tasoulis,et al.  Manifold Visualization via Short Walks , 2016, EuroVis.

[19]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[20]  Jingzhou Liu,et al.  Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[21]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[22]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[23]  Emma Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[24]  E. Shapiro,et al.  Single-cell sequencing-based technologies will revolutionize whole-organism science , 2013, Nature Reviews Genetics.

[25]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[26]  Vassilis P. Plagianakos,et al.  Visualizing High-dimensional single-cell RNA-sequencing data through multiple Random Projections , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[27]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[28]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[29]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[30]  Sean C. Bendall,et al.  Wishbone identifies bifurcating developmental trajectories from single-cell data , 2016, Nature Biotechnology.

[31]  Alex A. T. Bui,et al.  Envisioning the future of 'big data' biomedicine , 2017, J. Biomed. Informatics.

[32]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[33]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[34]  Christian Hennig,et al.  How Many Bee Species? A Case Study in Determining the Number of Clusters , 2012, GfKl.

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.