Submitted to the Annals of Applied Statistics SUBSPACE KERNEL SPECTRAL CLUSTERING OF LARGE DIMENSIONAL DATA By

By Abla Kammoun∗, Romain Couillet† Let x1, · · · , xn be independent observations of size p, each of them belonging to one of c distinct classes. We assume that observations within class a are characterized by their distribution N (0, 1 p Ca) where here C1, · · · , Cc are some non-negative definite p× p matrices. This paper studies the asymptotic behavior of the symmetric matrix Φ̃kl = √ p ( (xk xl) 2 δk 6=l ) when p and n grow to infinity with n p → c0. Particularly, we prove that, if the class covariance matrices are sufficiently close in a certain sense, the matrix Φ̃ behaves as a low-rank perturbation of a Wigner matrix, presenting possibly some isolated eigenvalues outside the bulk of the semi-circular law. We carry out a careful analysis of some of the isolated eigenvalues and eigenvectors of matrix Φ̃, and illustrate how these results can help understand spectral clustering methods that use Φ̃ as a kernel matrix.

[1]  R. Couillet,et al.  Kernel spectral clustering of large dimensional data , 2015, 1510.03547.

[2]  A. Montanari,et al.  The spectral norm of random inner-product kernel matrices , 2015, 1507.05343.

[3]  Walid Hachem,et al.  Fluctuations of Spiked Random Matrix Models and Failure Diagnosis in Sensor Networks , 2011, IEEE Transactions on Information Theory.

[4]  Xiuyuan Cheng,et al.  THE SPECTRUM OF RANDOM INNER-PRODUCT KERNEL MATRICES , 2012, 1202.3155.

[5]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[6]  R. Couillet,et al.  Random Matrix Methods for Wireless Communications: Estimation , 2011 .

[7]  J. W. Silverstein,et al.  Spectral Analysis of Large Dimensional Random Matrices , 2009 .

[8]  C. Donati-Martin,et al.  The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. , 2007, 0706.0136.

[9]  B. Schölkopf,et al.  Kernel methods in machine learning , 2007, math/0701907.

[10]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[11]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[12]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[13]  S. Thorbjørnsen,et al.  A new application of random matrices: Ext(C^*_{red}(F_2)) is not a group , 2002, math/0212265.

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  J. W. Silverstein,et al.  No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices , 1998 .