deepMc: deep Matrix Completion for imputation of single cell RNA-seq data

Single cell RNA-seq has fueled discovery and innovation in medicine over the past few years and is useful for studying cellular responses at individual cell resolution. But, due to paucity of starting RNA, the data acquired is highly sparse. To address this, We propose a deep matrix factorization based method, deepMc, to impute missing values in gene-expression data. For the deep architecture of our approach, We draw our motivation from great success of deep learning in solving various Machine learning problems. In this work, We support our method with positive results on several evaluation metrics like clustering of cell populations, differential expression analysis and cell type separability.

[1]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[2]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[3]  Lei Huang,et al.  Matrix completion based direction-of-arrival estimation in nonuniform noise , 2016, 2016 IEEE International Conference on Digital Signal Processing (DSP).

[4]  George Trigeorgis,et al.  A Deep Matrix Factorization Method for Learning Attribute Representations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[6]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[7]  Fabio Luciani,et al.  Impact of sequencing depth and read length on single cell RNA sequencing data of T cells , 2017, Scientific Reports.

[8]  Jinhui Tang,et al.  Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[9]  Catalin C. Barbacioru,et al.  Tracing the Derivation of Embryonic Stem Cells from the Inner Cell Mass by Single-Cell RNA-Seq Analysis , 2010, Cell stem cell.

[10]  Gil Alterovitz,et al.  Gene expression prediction using low-rank matrix completion , 2016, BMC Bioinformatics.

[11]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[12]  Wei Vivian Li,et al.  scImpute: accurate and robust imputation for single cell RNA-seq data , 2017, bioRxiv.

[13]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[14]  L. J. K. Wee,et al.  Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors , 2017, Nature Genetics.

[15]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[16]  F. Biase,et al.  Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing , 2014, Genome research.

[17]  Kevin R. Moon,et al.  MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data , 2017, bioRxiv.

[18]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[19]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[20]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[21]  Il-Youp Kwak,et al.  DrImpute: imputing dropout events in single cell RNA sequencing data , 2017, BMC Bioinformatics.

[22]  Mark D. Robinson,et al.  Robustly detecting differential expression in RNA sequencing data using observation weights , 2013, Nucleic acids research.

[23]  David A. Knowles,et al.  Batch effects and the effective design of single-cell gene expression studies , 2016, Scientific Reports.

[24]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[25]  Ivan Markovsky Exact system identification with missing data , 2013, 52nd IEEE Conference on Decision and Control.

[26]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[27]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[28]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[29]  Mayank Vatsa,et al.  Deep Dictionary Learning , 2016, IEEE Access.

[30]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[31]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[32]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[33]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[34]  Debarka Sengupta,et al.  Fast, scalable and accurate differential expression analysis for single cells , 2016, bioRxiv.