Optimal Shrinkage of Singular Values

We consider the recovery of low-rank matrices from noisy data by shrinkage of singular values, in which a single, univariate nonlinearity is applied to each of the empirical singular values. We adopt an asymptotic framework, in which the matrix size is much larger than the rank of the signal matrix to be recovered, and the signal-to-noise ratio of the low-rank piece stays constant. For a variety of loss functions, including Mean Square Error (MSE) - (square Frobenius norm), the nuclear norm loss and the operator norm loss, we show that in this framework there is a well-defined asymptotic loss that we evaluate precisely in each case. In fact, each of the loss functions we study admits a <italic>unique admissible</italic> shrinkage nonlinearity dominating all other nonlinearities. We provide a general method for evaluating these optimal nonlinearities, and demonstrate our framework by working out simple, explicit formulas for the optimal nonlinearities in the Frobenius, nuclear and operator norm cases. For example, for a square low-rank <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>-by-<inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> matrix observed in white noise with level <inline-formula> <tex-math notation="LaTeX">$\sigma $ </tex-math></inline-formula>, the optimal nonlinearity for MSE loss simply shrinks each data singular value <inline-formula> <tex-math notation="LaTeX">$y$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$\sqrt {y^{2}-4n\sigma ^{2}}$ </tex-math></inline-formula> (or to 0 if <inline-formula> <tex-math notation="LaTeX">$y<2\sqrt {n}\sigma $ </tex-math></inline-formula>). This optimal nonlinearity guarantees an asymptotic MSE of <inline-formula> <tex-math notation="LaTeX">$2nr\sigma ^{2}$ </tex-math></inline-formula>, which compares favorably with optimally tuned hard thresholding and optimally tuned soft thresholding, providing guarantees of <inline-formula> <tex-math notation="LaTeX">$3nr\sigma ^{2}$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$6nr\sigma ^{2}$ </tex-math></inline-formula>, respectively. Our general method also allows one to evaluate optimal shrinkers numerically to arbitrary precision. As an example, we compute optimal shrinkers for the Schatten-<inline-formula> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> norm loss, for any <inline-formula> <tex-math notation="LaTeX">$p>0$ </tex-math></inline-formula>.

[1]  D. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4 / √ 3 , 2013 .

[2]  Boaz Nadler,et al.  Non-Parametric Detection of the Number of Signals: Hypothesis Testing and Random Matrix Theory , 2009, IEEE Transactions on Signal Processing.

[3]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[4]  I. Johnstone,et al.  Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model. , 2013, Annals of statistics.

[5]  Gene H. Golub,et al.  Matrix computations , 1983 .

[6]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[7]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[8]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[10]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[11]  Raj Rao Nadakuditi,et al.  OptShrink: An Algorithm for Improved Low-Rank Signal Matrix Denoising by Optimal, Data-Driven Singular Value Shrinkage , 2013, IEEE Transactions on Information Theory.

[12]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[13]  Z. Bai,et al.  On the limit of the largest eigenvalue of the large dimensional sample covariance matrix , 1988 .

[14]  Patrick O. Perry Cross -validation for unsupervised learning , 2009, 0909.3052.

[15]  David L. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4/sqrt(3) , 2013, 1305.5870.

[16]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[17]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[18]  Donald A. Jackson STOPPING RULES IN PRINCIPAL COMPONENTS ANALYSIS: A COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES' , 1993 .

[19]  C. Tracy,et al.  Introduction to Random Matrices , 1992, hep-th/9210073.

[20]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[21]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[22]  J. W. Silverstein,et al.  On the empirical distribution of eigenvalues of large dimensional information-plus-noise-type matrices , 2007 .

[23]  Andrew B. Nobel,et al.  Reconstruction of a low-rank matrix in the presence of Gaussian noise , 2010, J. Multivar. Anal..

[24]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[25]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[26]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[27]  B. AfeArd CALCULATING THE SINGULAR VALUES AND PSEUDOINVERSE OF A MATRIX , 2022 .

[28]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[29]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[30]  Per Ola Börjesson,et al.  OFDM channel estimation by singular value decomposition , 1996, Proceedings of Vehicular Technology Conference - VTC.

[31]  T. Lagerlund,et al.  Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition. , 1997, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[32]  Z. Bai,et al.  Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , 1993 .

[33]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[34]  A. Tsybakov,et al.  Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.

[35]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[36]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[37]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[38]  Julie Josse,et al.  Adaptive shrinkage of singular values , 2013, Statistics and Computing.

[39]  D. Donoho,et al.  Minimax risk of matrix denoising by singular value thresholding , 2013, 1304.2085.