scRCMF: Identification of Cell Subpopulations and Transition States From Single-Cell Transcriptomes

Single cell technologies provide an unprecedented opportunity to explore the heterogeneity in a biological process at the level of single cells. One major challenge in analyzing single cell data is to identify cell subpopulations, stable cell states, and cells in transition between states. To elucidate the transition mechanisms in cell fate dynamics, it is highly desirable to quantitatively characterize cellular states and intermediate states. Here, we present scRCMF, an unsupervised method that identifies stable cell states and transition cells by adopting a nonlinear optimization model that infers the latent substructures from a gene-cell matrix. We incorporate a random coefficient matrix-based regularization into the standard nonnegative matrix decomposition model to improve the reliability and stability of estimating latent substructures. To quantify the transition capability of each cell, we propose two new measures: single-cell transition entropy (scEntropy) and transition probability (scTP). When applied to two simulated and three published scRNA-seq datasets, scRCMF not only successfully captures multiple subpopulations and transition processes in large-scale data, but also identifies transition states and some known marker genes associated with cell state transitions and subpopulations. Furthermore, the quantity scEntropy is found to be significantly higher for transition cells than other cellular states during the global differentiation, and the scTP predicts the “fate decisions” of transition cells within the transition. The present study provides new insights into transition events during differentiation and development.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Bruce J. Aronow,et al.  Single-cell analysis of mixed-lineage states leading to a binary cell fate choice , 2016, Nature.

[3]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[4]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[5]  Wuming Gong,et al.  Dpath software reveals hierarchical haemato-endothelial lineages of Etv2 progenitors based on single-cell transcriptome analysis , 2017, Nature Communications.

[6]  Haiyan Huang,et al.  SIDEseq: A Cell Similarity Measure Defined by Shared Identified Differentially Expressed Genes for Single-Cell RNA sequencing Data , 2017, Statistics in Biosciences.

[7]  Zhonggang Zeng,et al.  A Rank-Revealing Method with Updating, Downdating, and Applications. Part II , 2009, SIAM J. Matrix Anal. Appl..

[8]  A. M. Arias,et al.  Transition states and cell fate decisions in epigenetic landscapes , 2016, Nature Reviews Genetics.

[9]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[10]  Mikael Huss,et al.  Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. , 2010, Developmental cell.

[11]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[12]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[13]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[14]  Thomas Höfer,et al.  Robust classification of single-cell transcriptome data by nonnegative matrix factorization , 2017, Bioinform..

[15]  Hitoshi Niwa,et al.  Extra-embryonic endoderm cells derived from ES cells induced by GATA Factors acquire the character of XEN cells , 2007, BMC Developmental Biology.

[16]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[17]  J. Rossant,et al.  Sox17-mediated XEN cell conversion identifies dynamic networks controlling cell-fate decisions in embryo-derived stem cells. , 2014, Cell reports.

[18]  J. Marioni,et al.  Single-Cell Landscape of Transcriptional Heterogeneity and Cell Fate Decisions during Mouse Early Gastrulation , 2017, Cell reports.

[19]  Geng Yang,et al.  Fuzzy Linear Regression Discriminant Projection for Face Recognition , 2017, IEEE Access.

[20]  Zhonggang Zeng,et al.  A Rank-Revealing Method with Updating, Downdating, and Applications , 2005, SIAM J. Matrix Anal. Appl..

[21]  Xiufen Zou,et al.  Trajectory Control in Nonlinear Networked Systems and Its Applications to Complex Biological Systems , 2018, SIAM J. Appl. Math..

[22]  M. Cugmas,et al.  On comparing partitions , 2015 .

[23]  Janet Rossant,et al.  The Hippo signaling pathway components Lats and Yap pattern Tead4 activity to distinguish mouse trophectoderm from inner cell mass. , 2009, Developmental cell.

[24]  D. Derryberry,et al.  A graphical framework for model selection criteria and significance tests: refutation, confirmation and ecology , 2017 .

[25]  A. Teschendorff,et al.  Single-cell entropy for accurate estimation of differentiation potency from a cell's transcriptome , 2017, Nature Communications.

[26]  Li Qian,et al.  SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data , 2016, Genome Biology.

[27]  Qing Nie,et al.  Exploring intermediate cell states through the lens of single cells , 2018, Current opinion in systems biology.

[28]  Xingming Sun,et al.  Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement , 2016, IEEE Transactions on Information Forensics and Security.

[29]  Caleb Weinreb,et al.  SPRING: a kinetic interface for visualizing high dimensional single-cell expression data , 2017, bioRxiv.

[30]  Cole Trapnell,et al.  Defining cell types and states with single-cell genomics , 2015, Genome research.

[31]  Qing Nie,et al.  Cell lineage and communication network inference via optimization for single-cell transcriptomics , 2019, Nucleic acids research.

[32]  M. Nieto Epithelial Plasticity: A Common Theme in Embryonic and Cancer Cells , 2013, Science.

[33]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[34]  Qing Nie,et al.  Single-cell analysis reveals fibroblast heterogeneity and myeloid-derived adipocyte progenitors in murine skin wounds , 2019, Nature Communications.

[35]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[36]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[37]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[38]  Soumen Paul,et al.  GATA3 Is Selectively Expressed in the Trophectoderm of Peri-implantation Embryo and Directly Regulates Cdx2 Gene Expression* , 2009, The Journal of Biological Chemistry.

[39]  Hannah H. Chang,et al.  Cell Fate Decision as High-Dimensional Critical State Transition , 2016, bioRxiv.

[40]  Tao Peng,et al.  scEpath: energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data , 2018, Bioinform..

[41]  M. Guo,et al.  SLICE: determining cell differentiation and lineage based on single cell entropy , 2016, Nucleic acids research.

[42]  S. Horvath,et al.  Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing , 2013, Nature.

[43]  Neil Genzlinger A. and Q , 2006 .

[44]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[45]  Haesun Park,et al.  SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering , 2014, Journal of Global Optimization.

[46]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[47]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.