Doubly-Stochastic Normalization of the Gaussian Kernel is Robust to Heteroskedastic Noise

A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate m -1/2, where m is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.

[1]  T. Raghavan,et al.  Nonnegative Matrices and Applications , 1997 .

[2]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[3]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[4]  N. Hacohen,et al.  Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors , 2017, Science.

[5]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[6]  Cynthia C. Hession,et al.  Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons , 2016, Science.

[7]  Martin Jinye Zhang,et al.  Determining sequencing depth in a single-cell RNA-seq experiment , 2020, Nature Communications.

[8]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[9]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[10]  P. J. Chase,et al.  Order independence and factor convergence in iterative scaling , 1993 .

[11]  Ronen Basri,et al.  SpectralNet: Spectral Clustering using Deep Neural Networks , 2018, ICLR.

[12]  Stefan Steinerberger,et al.  Fast Interpolation-based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data , 2017, Nature Methods.

[13]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[14]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[15]  Gene Cheung,et al.  Graph Laplacian Regularization for Image Denoising: Analysis in the Continuous Domain , 2016, IEEE Transactions on Image Processing.

[16]  Yoel Shkolnisky,et al.  The steerable graph Laplacian and its application to filtering image data-sets , 2018, SIAM J. Imaging Sci..

[17]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[18]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[20]  Avi Wigderson,et al.  Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  R. Cochran,et al.  Statistically weighted principal component analysis of rapid scanning wavelength kinetics experiments , 1977 .

[23]  Tsevi Mazeh,et al.  Correcting systematic effects in a large set of photometric light curves , 2005, astro-ph/0502056.

[24]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[25]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[26]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[27]  Xiang Zhou,et al.  Demystifying “drop-outs” in single-cell UMI data , 2020, Genome Biology.

[28]  A. Singer From graph to manifold Laplacian: The convergence rate , 2006 .

[29]  Xilin Shen,et al.  Perturbation of the Eigenvectors of the Graph Laplacian: Application to Image Denoising , 2012, ArXiv.

[30]  Matthias Hein,et al.  Error Estimates for Spectral Convergence of the Graph Laplacian on Random Geometric Graphs Toward the Laplace–Beltrami Operator , 2018, Found. Comput. Math..

[31]  Mario Beauchemin,et al.  On affinity matrix normalization for graph cuts and spectral clustering , 2015, Pattern Recognit. Lett..

[32]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[33]  Alessandro Foi,et al.  Noise estimation and removal in MR imaging: The variance-stabilization approach , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[34]  J. Harlim,et al.  Variable Bandwidth Diffusion Kernels , 2014, 1406.5064.

[35]  F. W. Townes,et al.  Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model , 2019, Genome Biology.

[36]  P. Bickel,et al.  Role of normalization in spectral clustering for stochastic blockmodels , 2013, 1310.1495.

[37]  Pierre Vandergheynst,et al.  Wavelets on Graphs via Spectral Graph Theory , 2009, ArXiv.

[38]  Noureddine El Karoui,et al.  Graph connection Laplacian methods can be made robust to noise , 2016 .

[39]  Ronald R. Coifman,et al.  Manifold learning with bi-stochastic kernels , 2017, IMA Journal of Applied Mathematics.

[40]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[41]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[42]  Yoel Shkolnisky,et al.  Diffusion Interpretation of Nonlocal Neighborhood Filters for Signal Denoising , 2009, SIAM J. Imaging Sci..

[43]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[44]  Ulrike von Luxburg,et al.  From Graphs to Manifolds - Weak and Strong Pointwise Consistency of Graph Laplacians , 2005, COLT.

[45]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[46]  T. Sauer,et al.  Local Kernels and the Geometric Structure of Data , 2014, 1407.1426.

[47]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[49]  Martin Idel A review of matrix scaling and Sinkhorn's normal form for matrices and positive maps , 2016, 1609.06349.

[50]  Noureddine El Karoui,et al.  On information plus noise kernel random matrices , 2010, 1011.2660.

[51]  Anru R. Zhang,et al.  Multi-sample Estimation of Bacterial Composition Matrix in Metagenomics Data , 2017, 1706.02380.

[52]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[53]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[54]  R. Coifman,et al.  Diffusion Wavelets , 2004 .

[55]  Anru R. Zhang,et al.  Microbial Composition Estimation from Sparse Count Data , 2017 .

[56]  Amnon Shashua,et al.  Doubly Stochastic Normalization for Spectral Clustering , 2006, NIPS.

[57]  A. Wald,et al.  On Stochastic Limit and Order Relationships , 1943 .

[58]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[59]  Rebecca Willett,et al.  Poisson Noise Reduction with Non-local PCA , 2012, Journal of Mathematical Imaging and Vision.

[60]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[61]  Alessandro Foi,et al.  Clipped noisy images: Heteroskedastic modeling and practical denoising , 2009, Signal Process..

[62]  J. Bénasséni A new derivation of eigenvalue inequalities for the multinomial distribution , 2012 .

[63]  Fei Wang,et al.  Improving clustering by learning a bi-stochastic data similarity matrix , 2011, Knowledge and Information Systems.