Quasi-universality in single-cell sequencing data

The development of single-cell technologies provides the opportunity to identify new cellular states and reconstruct novel cell-to-cell relationships. Applications range from understanding the transcriptional and epigenetic processes involved in metazoan development to characterizing distinct cells types in heterogeneous populations like cancers or immune cells. However, analysis of the data is impeded by its unknown intrinsic biological and technical variability together with its sparseness; these factors complicate the identification of true biological signals amidst artifact and noise. Here we show that, across technologies, roughly 95% of the eigenvalues derived from each single-cell data set can be described by universal distributions predicted by Random Matrix Theory. Interestingly, 5% of the spectrum shows deviations from these distributions and present a phenomenon known as eigenvector localization, where information tightly concentrates in groups of cells. Some of the localized eigenvectors reflect underlying biological signal, and some are simply a consequence of the sparsity of single cell data; roughly 3% is artifactual. Based on the universal distributions and a technique for detecting sparsity induced localization, we present a strategy to identify the residual 2% of directions that encode biological information and thereby denoise single-cell data. We demonstrate the effectiveness of this approach by comparing with standard single-cell data analysis techniques in a variety of examples with marked cell populations.

[1]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[2]  H. Yau,et al.  Universality of general $\beta$-ensembles , 2011, 1104.2272.

[3]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[4]  S. Péché,et al.  Universality of local eigenvalue statistics for some sample covariance matrices , 2005 .

[5]  F. Dyson Statistical Theory of the Energy Levels of Complex Systems. I , 1962 .

[6]  Jun Yin,et al.  Edge universality of correlation matrices , 2011, 1112.2381.

[7]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[8]  N. Pillai,et al.  Universality of covariance matrices , 2011, 1110.2501.

[9]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[10]  Noureddine El Karoui,et al.  The spectrum of kernel random matrices , 2010, 1001.0492.

[11]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[12]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[13]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[14]  Rodgers,et al.  Density of states of a sparse random matrix. , 1988, Physical review. B, Condensed matter.

[15]  Ambrose J. Carr,et al.  Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment , 2018, Cell.

[16]  William Stafford Noble,et al.  Massively multiplex single-cell Hi-C , 2016, Nature Methods.

[17]  Spectral density singularities, level statistics, and localization in a sparse random matrix ensemble. , 1992 .

[18]  T. Tao,et al.  Random covariance matrices: Universality of local statistics of eigenvalues , 2009, 0912.0966.

[19]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[20]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[21]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[22]  Mauro J. Muraro,et al.  De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data , 2016, Cell stem cell.

[23]  T. Tao,et al.  Random Matrices: Universality of Local Eigenvalue Statistics up to the Edge , 2009, 0908.1982.

[24]  Elena K. Kandror,et al.  Single-cell topological RNA-Seq analysis reveals insights into cellular differentiation and development , 2017, Nature Biotechnology.

[25]  P. Anderson Absence of Diffusion in Certain Random Lattices , 1958 .

[26]  Fyodorov,et al.  Localization in ensemble of sparse random matrices. , 1991, Physical review letters.

[27]  O. Bohigas,et al.  Characterization of chaotic quantum spectra and universality of level fluctuation laws , 1984 .

[28]  Andrew C. Adey,et al.  Single-Cell Transcriptional Profiling of a Multicellular Organism , 2017 .

[29]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[30]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[31]  Spectral density singularities, level statistics, and localization in a sparse random matrix ensemble. , 1992, Physical review letters.

[32]  Hannah A. Pliner,et al.  The cis-regulatory dynamics of embryonic development at single cell resolution , 2017, Nature.

[33]  Van H. Vu,et al.  Sparse random graphs: Eigenvalues and eigenvectors , 2010, Random Struct. Algorithms.

[34]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[35]  C. Donati-Martin,et al.  The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. , 2007, 0706.0136.

[36]  J. George,et al.  Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes , 2017, Genome research.

[37]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[38]  Jun Yin,et al.  The local relaxation flow approach to universality of the local statistics for random matrices , 2009, 0911.3687.

[39]  N. Hacohen,et al.  Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors , 2017, Science.

[40]  Y. Fyodorov,et al.  Universality of level correlation function of sparse random matrices , 1991 .

[41]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[42]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[43]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[44]  Wei Vivian Li,et al.  An accurate and robust imputation method scImpute for single-cell RNA-seq data , 2018, Nature Communications.

[45]  Nicola K. Wilson,et al.  A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. , 2016, Blood.

[46]  A. Regev,et al.  Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis , 2018, Science.

[47]  V. Plerou,et al.  Random matrix approach to cross correlations in financial data. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Robert J. Valenza,et al.  Eigenvalues and Eigenvectors , 1993 .

[49]  Yaron E. Antebi,et al.  Dynamics of epigenetic regulation at the single-cell level , 2016, Science.

[50]  E. Wigner Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .

[51]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[52]  F. Dyson A Brownian‐Motion Model for the Eigenvalues of a Random Matrix , 1962 .

[53]  H. Yau,et al.  Bulk universality of sparse random matrices , 2015, 1504.05170.

[54]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[55]  Kevin Schnelli,et al.  Local law and Tracy–Widom limit for sparse random matrices , 2016, 1605.08767.

[56]  A. Odlyzko On the distribution of spacings between zeros of the zeta function , 1987 .

[57]  Raj Rao Nadakuditi,et al.  The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.