Cell Subclass Identification in Single-Cell RNA-Sequencing Data Using Orthogonal Nonnegative Matrix Factorization

Identification of cell subclasses using single-cell RNA-Sequencing (scRNA-Seq) data is of paramount importance since it uncovers the hidden biological processes within the cell population. While the nonnegative matrix factorization (NMF) model has been reported to be effective in various unsupervised clustering tasks, it may still produce inappropriate results for some scRNA-Seq datasets with heterogeneous structures. In this paper, we propose the use of an orthogonally constrained NMF (ONMF) model for the subclass identification problem of scRNA-Seq datasets. The ONMF model in general can provide improved clustering performance, but is challenging to solve. We present a computationally efficient algorithm based on optimization techniques of variable splitting and alternating direction method of multipliers (ADMM). Through two scRNA-Seq datasets, we show that the proposed method can yield promising performance in identifying cell subclasses and detecting key genes over the existing methods. Moreover, the key genes identified by the proposed method are shown biologically significant via the gene set enrichment analysis.

[1]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[2]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[3]  Hassan Mansour,et al.  Video querying via compact descriptors of visually salient objects , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[4]  William Stafford Noble,et al.  How does multiple testing correction work? , 2009, Nature Biotechnology.

[5]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[6]  Hui Ji,et al.  An Augmented Lagrangian Method for ℓ1-Regularized Optimization Problems with Orthogonality Constraints , 2016, SIAM J. Sci. Comput..

[7]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[8]  Nicolas Gillis,et al.  Two algorithms for orthogonal nonnegative matrix factorization with application to clustering , 2012, Neurocomputing.

[9]  N. Neff,et al.  Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq , 2016, Nature.

[10]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[11]  Marinka Zitnik,et al.  Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins , 2016, Bioinform..

[12]  Gerhard Christofori,et al.  EMT, the cytoskeleton, and cancer cell invasion , 2009, Cancer and Metastasis Reviews.

[13]  Rongjie Lai,et al.  A Splitting Method for Orthogonality Constrained Problems , 2014, J. Sci. Comput..

[14]  Xiaoping Su,et al.  Intrinsic basal and luminal subtypes of muscle-invasive bladder cancer , 2014, Nature Reviews Urology.

[15]  Seungjin Choi,et al.  Algorithms for orthogonal nonnegative matrix factorization , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[16]  Seungjin Choi,et al.  Nonnegative Matrix Factorization with Orthogonality Constraints , 2010, J. Comput. Sci. Eng..

[17]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[18]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  S. Weissman,et al.  Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization , 2017, PeerJ.

[20]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[21]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[22]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[23]  Andri Mirzal,et al.  Nonparametric Orthogonal NMF and its Application in Cancer Clustering , 2014, DaEng.

[24]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[26]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.