PBLR: an accurate single cell RNA-seq data imputation tool considering cell heterogeneity and prior expression level of dropouts

Single-cell RNA sequencing (scRNA-seq) provides a powerful tool to determine precise expression patterns of tens of thousands of individual cells, decipher cell heterogeneity and cell subpopulations and so on. However, scRNA-seq data analysis remains challenging due to various technical noise, e.g., the presence of dropout events (i.e., excess zero counts). Taking account of cell heterogeneity and structural effect of expression on dropout rate, we propose a novel method named PBLR to accurately impute the dropouts of scRNA-seq data. PBLR is an effective tool to recover dropout events on both simulated and real scRNA-seq datasets, and can dramatically improve low-dimensional representation and recovery of gene-gene relationship masked by dropout events compared to several state-of-the-art methods. Moreover, PBLR also detect accurate and robust cell subpopulations automatically, shedding light its flexibility and generality for scRNA-seq data analysis.

[1]  Mark Crovella,et al.  Targeted matrix completion , 2017, SDM.

[2]  Kathryn Roeder,et al.  A United Statistical Framework for Single Cell and Bulk Sequencing Data , 2016, bioRxiv.

[3]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[4]  Shiqian Ma,et al.  Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..

[5]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[6]  Andrew J Rennekamp,et al.  Toward a Choate View of Fate , 2018, Cell.

[7]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[8]  Tal Nawy,et al.  Single-cell sequencing , 2013, Nature Methods.

[9]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[10]  Xiaoming Yuan,et al.  Matrix completion via an alternating direction method , 2012 .

[11]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[12]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[13]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[14]  N. Neff,et al.  Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq , 2016, Nature.

[15]  Philip M. Kim,et al.  Subsystem identification through dimensionality reduction of large-scale gene expression data. , 2003, Genome research.

[16]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[17]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[18]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[19]  Il-Youp Kwak,et al.  DrImpute: imputing dropout events in single cell RNA sequencing data , 2017, BMC Bioinformatics.

[20]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[21]  R. Glowinski,et al.  Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics , 1987 .

[22]  Lihua Zhang,et al.  Comparison of Computational Methods for Imputing Single-Cell RNA-Sequencing Data , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[24]  R. Glowinski,et al.  Numerical Methods for Nonlinear Variational Problems , 1985 .

[25]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Sandhya Prabhakaran,et al.  Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data , 2016, ICML.

[27]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[28]  Daphne M. Tsoucas,et al.  GiniClust2: a cluster-aware, weighted ensemble clustering method for cell-type detection , 2018, Genome Biology.

[29]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[30]  Haesun Park,et al.  SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering , 2014, Journal of Global Optimization.

[31]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[32]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[33]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[34]  Guo-Cheng Yuan,et al.  A cluster-aware, weighted ensemble clustering method for cell-type detection , 2018 .

[35]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[36]  E. Candès,et al.  Exact low-rank matrix completion via convex optimization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[37]  Sara Ballouz,et al.  Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor , 2018, Nature Communications.

[38]  Muhammad Tayyab Asif,et al.  Low-dimensional models for missing data imputation in road networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[40]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[41]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.