A Random Matrix Theory Approach to Denoise Single-Cell Data

Summary Single-cell technologies provide the opportunity to identify new cellular states. However, a major obstacle to the identification of biological signals is noise in single-cell data. In addition, single-cell data are very sparse. We propose a new method based on random matrix theory to analyze and denoise single-cell sequencing data. The method uses the universal distributions predicted by random matrix theory for the eigenvalues and eigenvectors of random covariance/Wishart matrices to distinguish noise from signal. In addition, we explain how sparsity can cause spurious eigenvector localization, falsely identifying meaningful directions in the data. We show that roughly 95% of the information in single-cell data is compatible with the predictions of random matrix theory, about 3% is spurious signal induced by sparsity, and only the last 2% reflects true biological signal. We demonstrate the effectiveness of our approach by comparing with alternative techniques in a variety of examples with marked cell populations.

[1]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[2]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[3]  Fyodorov,et al.  Localization in ensemble of sparse random matrices. , 1991, Physical review letters.

[4]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[5]  O. Bohigas,et al.  Characterization of chaotic quantum spectra and universality of level fluctuation laws , 1984 .

[6]  Andrew C. Adey,et al.  Single-Cell Transcriptional Profiling of a Multicellular Organism , 2017 .

[7]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[8]  J. George,et al.  Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes , 2017, Genome research.

[9]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[10]  Yaron E. Antebi,et al.  Dynamics of epigenetic regulation at the single-cell level , 2016, Science.

[11]  E. Wigner Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .

[12]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[13]  Spectral density singularities, level statistics, and localization in a sparse random matrix ensemble. , 1992, Physical review letters.

[14]  A. Odlyzko On the distribution of spacings between zeros of the zeta function , 1987 .

[15]  Y. Fyodorov,et al.  Universality of level correlation function of sparse random matrices , 1991 .

[16]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[17]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[18]  Ambrose J. Carr,et al.  Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment , 2018, Cell.

[19]  William Stafford Noble,et al.  Massively multiplex single-cell Hi-C , 2016, Nature Methods.

[20]  T. Elston,et al.  Stochasticity in gene expression: from theories to phenotypes , 2005, Nature Reviews Genetics.

[21]  Hannah A. Pliner,et al.  The cis-regulatory dynamics of embryonic development at single cell resolution , 2017, Nature.

[22]  Van H. Vu,et al.  Sparse random graphs: Eigenvalues and eigenvectors , 2010, Random Struct. Algorithms.

[23]  J. Peccoud,et al.  Markovian Modeling of Gene-Product Synthesis , 1995 .

[24]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[25]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[26]  Jeffrey W. Smith,et al.  Stochastic Gene Expression in a Single Cell , .

[27]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[28]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[29]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[30]  Elena K. Kandror,et al.  Single-cell topological RNA-Seq analysis reveals insights into cellular differentiation and development , 2017, Nature Biotechnology.

[31]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[32]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[33]  Rodgers,et al.  Density of states of a sparse random matrix. , 1988, Physical review. B, Condensed matter.

[34]  Spectral density singularities, level statistics, and localization in a sparse random matrix ensemble. , 1992 .

[35]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[36]  S. Péché,et al.  Universality of local eigenvalue statistics for some sample covariance matrices , 2005 .

[37]  Noureddine El Karoui,et al.  The spectrum of kernel random matrices , 2010, 1001.0492.

[38]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[39]  Mauro J. Muraro,et al.  De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data , 2016, Cell stem cell.

[40]  T. Tao,et al.  Random Matrices: Universality of Local Eigenvalue Statistics up to the Edge , 2009, 0908.1982.

[41]  P. Anderson Absence of Diffusion in Certain Random Lattices , 1958 .

[42]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[43]  Nicola K. Wilson,et al.  A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. , 2016, Blood.

[44]  A. Regev,et al.  Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis , 2018, Science.

[45]  N. Pillai,et al.  Universality of covariance matrices , 2011, 1110.2501.

[46]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.