netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis.

Single-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expression in single cells. However, due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network regularized nonnegative matrix factorization. The network regularization takes advantage of prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene-gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., above 60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.

[1]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[2]  Nancy R. Zhang,et al.  SAVER: Gene expression recovery for single-cell RNA sequencing , 2018, Nature Methods.

[3]  Altuna Akalin,et al.  netSmooth: Network-smoothing based imputation for single cell RNA-seq , 2017, bioRxiv.

[4]  Ali Taylan Cemgil,et al.  Nonnegative matrix factorizations as probabilistic inference in composite models , 2009, 2009 17th European Signal Processing Conference.

[5]  Avi Ma'ayan,et al.  Construction and Validation of a Regulatory Network for Pluripotency and Self-Renewal of Mouse Embryonic Stem Cells , 2014, PLoS Comput. Biol..

[6]  Deepak Kumar Jha,et al.  A high-resolution transcriptome map of cell cycle reveals novel connections between periodic genes and cancer , 2016, Cell Research.

[7]  Valentine Svensson Droplet scRNA-seq is not zero-inflated , 2020, Nature Biotechnology.

[8]  F. W. Townes,et al.  Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model , 2019, Genome Biology.

[9]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[10]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[11]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[12]  Aedín C. Culhane,et al.  GeneSigDB—a curated database of gene expression signatures , 2009, Nucleic Acids Res..

[13]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[14]  Kathryn Roeder,et al.  A United Statistical Framework for Single Cell and Bulk Sequencing Data , 2016, bioRxiv.

[15]  Y. Kluger,et al.  Zero-preserving imputation of scRNA-seq data using low-rank approximation , 2018, bioRxiv.

[16]  Franck Picard,et al.  Probabilistic count matrix factorization for single cell expression data analysis , 2019, Bioinform..

[17]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[18]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[19]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[20]  Sandhya Prabhakaran,et al.  Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data , 2016, ICML.

[21]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[22]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[23]  Ambrose J. Carr,et al.  Bayesian Inference for Single-cell Clustering and Imputing , 2017 .

[24]  Chad J Creighton,et al.  Pan‐cancer survey of epithelial–mesenchymal transition markers across the Cancer Genome Atlas , 2018, Developmental dynamics : an official publication of the American Association of Anatomists.

[25]  Yang Liu,et al.  A fast and efficient count-based matrix factorization method for detecting cell types from single-cell RNAseq data , 2019, BMC Systems Biology.

[26]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[27]  Allon M. Klein,et al.  Single-cell barcoding and sequencing using droplet microfluidics , 2016, Nature Protocols.

[28]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[29]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[30]  A. Taylan Cemgil,et al.  Variational Nonnegative Matrix Factorisation , 2009, 2009 IEEE 17th Signal Processing and Communications Applications Conference.

[31]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[32]  A. Michelucci,et al.  Cellular and Molecular Characterization of Microglia: A Unique Immune Cell Population , 2017, Front. Immunol..

[33]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[34]  Kengo Kinoshita,et al.  COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems , 2014, Nucleic Acids Res..

[35]  Panos Roussos,et al.  Brain Cell Type Specific Gene Expression and Co-expression Network Architectures , 2018, Scientific Reports.

[36]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[37]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[38]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[39]  Hyojin Kim,et al.  COEXPEDIA: exploring biomedical hypotheses via co-expressions associated with medical subject headings (MeSH) , 2016, Nucleic Acids Res..

[40]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[41]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[42]  Sahin Albayrak,et al.  Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization , 2010, SDM.

[43]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[44]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[45]  L. Stein,et al.  A human functional protein interaction network and its application to cancer data analysis , 2010, Genome Biology.

[46]  T. Möller,et al.  Next generation transcriptomics and genomics elucidate biological complexity of microglia in health and disease , 2016, Glia.

[47]  Dacheng Tao,et al.  Signed Laplacian Embedding for Supervised Dimension Reduction , 2014, AAAI.

[48]  M. Gut,et al.  bigSCale: an analytical framework for big-scale single-cell data. , 2018, Genome research.

[49]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.