Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data

The development of single cell RNA sequencing (scRNA-seq) has enabled innovative approaches to investigating mRNA abundances. In our study, we are interested in extracting the systematic patterns of scRNA-seq data in an unsupervised manner, thus we have developed two extensions of robust principal component analysis (RPCA). First, we present a truncated version of RPCA (tRPCA), that is much faster and memory efficient. Second, we introduce a noise reduction in tRPCA with \(L_2\) regularization (tRPCAL2). Unlike RPCA that only considers a low-rank L and sparse S matrices, the proposed method can also extract a noise E matrix inherent in modern genomic data. We demonstrate its usefulness by applying our methods on the peripheral blood mononuclear cell (PBMC) scRNA-seq data. Particularly, the clustering of a low-rank L matrix showcases better classification of unlabeled single cells. Overall, the proposed variants are well-suited for high-dimensional and noisy data that are routinely generated in genomics.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  M. Adachi,et al.  Platelet factor 4 gene expression in a human megakaryocytic leukemia cell line (CMK) and its differentiated subclone (CMK11-5). , 1991, Experimental hematology.

[3]  N. Navin,et al.  Advances and applications of single-cell sequencing technologies. , 2015, Molecular cell.

[4]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[5]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[6]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[7]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[8]  Weronika Wronowska,et al.  Inferring Molecular Processes Heterogeneity from Transcriptional Data , 2017, BioMed research international.

[9]  S. Seki,et al.  Systematic characterization of human CD8+ T cells with natural killer cell markers in comparison with natural killer cells and normal CD8+ T cells , 2001, Immunology.

[10]  John D. Storey,et al.  Statistical significance of variables driving systematic variation in high-dimensional data , 2013, Bioinform..

[11]  J. Leek Asymptotic Conditional Singular Value Decomposition for High‐Dimensional Genomic Data , 2011, Biometrics.

[12]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[13]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[14]  I. Jolliffe Principal Component Analysis , 2002 .

[15]  Eric Abadie,et al.  Genetic tests and genomic biomarkers: regulation, qualification and validation. , 2008, Clinical cases in mineral and bone metabolism : the official journal of the Italian Society of Osteoporosis, Mineral Metabolism, and Skeletal Diseases.

[16]  R. Stewart,et al.  Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm , 2016, Genome Biology.

[17]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[18]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[19]  P. Chu,et al.  CD79: a review. , 2001, Applied immunohistochemistry & molecular morphology : AIMM.

[20]  Silvano Sozzani,et al.  Nomenclature of monocytes and dendritic cells in blood. , 2010, Blood.

[21]  Xiaoming Yuan,et al.  Sparse and low-rank matrix decomposition via alternating direction method , 2013 .

[22]  Maciej Sykulski RobustPCA: Decompose a Matrix into Low-Rank and SparseComponents , 2015 .

[23]  Sreemanti Basu,et al.  Purification of Specific Cell Population by Fluorescence Activated Cell Sorting (FACS) , 2010, Journal of visualized experiments : JoVE.

[24]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[25]  Quin F. Wills,et al.  Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments , 2013, Nature Biotechnology.