The Optimal Hard Threshold for Singular Values is 4 / √ 3

We consider recovery of low-rank matrices from noisy data by hard thresholding of singular values, in which empirical singular values below a threshold λ are set to 0. We study the asymptotic MSE (AMSE) in a framework where the matrix size is large compared to the rank of the matrix to be recovered, and the signal-to-noise ratio of the low-rank piece stays constant. The AMSE-optimal choice of hard threshold, in the case of n-by-n matrix in white noise of level σ, is simply (4/ √ 3) √ nσ ≈ 2.309 √ nσ when σ is known, or simply 2.858·ymed when σ is unknown, where ymed is the median empirical singular value. For nonsquare m by n matrices with m 6= n the thresholding coefficients 4/ √ 3 and 2.858 are replaced with different provided constants that depend on m/n. Asymptotically, this thresholding rule adapts to unknown rank and unknown noise level in an optimal manner: it is always better than hard thresholding at any other value, and is always better than ideal Truncated SVD (TSVD), which truncates at the true rank of the low-rank matrix we are trying to recover. Hard thresholding at the recommended value to recover an n-by-n matrix of rank r guarantees an AMSE at most 3nrσ. In comparison, the guarantees provided by TSVD, optimally tuned singular value soft thresholding and the best guarantee achievable by any shrinkage of the data singular values are 5nrσ, 6nrσ, and 2nrσ, respectively. The recommended value for hard threshold also offers, among hard thresholds, the best possible AMSE guarantees for recovering matrices with bounded nuclear norm. Empirical evidence suggests that performance improvement over TSVD and other popular shrinkage rules can be substantial, for different noise distributions, even in relatively small n.

[1]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[2]  David L. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4/sqrt(3) , 2013, 1305.5870.

[3]  D. Donoho,et al.  Minimax risk of matrix denoising by singular value thresholding , 2013, 1304.2085.

[4]  Jared Tanner,et al.  Normalized Iterative Hard Thresholding for Matrix Completion , 2013, SIAM J. Sci. Comput..

[5]  Dai Shi,et al.  Asymptotic Joint Distribution of Extreme Sample Eigenvalues and Eigenvectors in the Spiked Population Model , 2013, 1304.6113.

[6]  Emmanuel J. Candès,et al.  Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators , 2012, IEEE Transactions on Signal Processing.

[7]  Andrew B. Nobel,et al.  Reconstruction of a low-rank matrix in the presence of Gaussian noise , 2010, J. Multivar. Anal..

[8]  Cedric E. Ginestet Spectral Analysis of Large Dimensional Random Matrices, 2nd edn , 2012 .

[9]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[10]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[11]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[12]  J. W. Silverstein,et al.  Spectral Analysis of Large Dimensional Random Matrices , 2009 .

[13]  Sewoong Oh,et al.  A Gradient Descent Algorithm on the Grassman Manifold for Matrix Completion , 2009, ArXiv.

[14]  Boaz Nadler,et al.  Non-Parametric Detection of the Number of Signals: Hypothesis Testing and Random Matrix Theory , 2009, IEEE Transactions on Signal Processing.

[15]  Patrick O. Perry Cross -validation for unsupervised learning , 2009, 0909.3052.

[16]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[17]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[18]  Z. Bai,et al.  Central limit theorems for eigenvalues in a spiked population model , 2008, 0806.2503.

[19]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[20]  Peter D. Hoff,et al.  Model Averaging and Dimension Selection for the Singular Value Decomposition , 2006, math/0609042.

[21]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[22]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[23]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[24]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[25]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  T. Lagerlund,et al.  Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition. , 1997, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[27]  Per Ola Börjesson,et al.  OFDM channel estimation by singular value decomposition , 1996, Proceedings of Vehicular Technology Conference - VTC.

[28]  Donald A. Jackson STOPPING RULES IN PRINCIPAL COMPONENTS ANALYSIS: A COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES' , 1993 .

[29]  Z. D. Bai,et al.  Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , 1993 .

[30]  Z. Bai,et al.  On the limit of the largest eigenvalue of the large dimensional sample covariance matrix , 1988 .

[31]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[32]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .