Accuracy of Pseudo-Inverse Covariance Learning—A Random Matrix Theory Analysis

For many learning problems, estimates of the inverse population covariance are required and often obtained by inverting the sample covariance matrix. Increasingly for modern scientific data sets, the number of sample points is less than the number of features and so the sample covariance is not invertible. In such circumstances, the Moore-Penrose pseudo-inverse sample covariance matrix, constructed from the eigenvectors corresponding to nonzero sample covariance eigenvalues, is often used as an approximation to the inverse population covariance matrix. The reconstruction error of the pseudo-inverse sample covariance matrix in estimating the true inverse covariance can be quantified via the Frobenius norm of the difference between the two. The reconstruction error is dominated by the smallest nonzero sample covariance eigenvalues and diverges as the sample size becomes comparable to the number of features. For high-dimensional data, we use random matrix theory techniques and results to study the reconstruction error for a wide class of population covariance matrices. We also show how bagging and random subspace methods can result in a reduction in the reconstruction error and can be combined to improve the accuracy of classifiers that utilize the pseudo-inverse sample covariance matrix. We test our analysis on both simulated and benchmark data sets.

[1]  L. Mirsky A trace inequality of John von Neumann , 1975 .

[2]  J. Lasserre A trace inequality for matrix product , 1995, IEEE Trans. Autom. Control..

[3]  I. Johnstone High Dimensional Statistical Inference and Random Matrices , 2006, math/0611589.

[4]  Y. Takane,et al.  Generalized Inverse Matrices , 2011 .

[5]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[6]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[7]  J. Hertz,et al.  Generalization in a linear perceptron in the presence of noise , 1992 .

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  Kanter,et al.  Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.

[10]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[11]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[12]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[13]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[14]  M. Rattray,et al.  Statistical mechanics of learning multiple orthogonal signals: asymptotic theory and fluctuation effects. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  G. Stewart The Efficient Generation of Random Orthogonal Matrices with an Application to Condition Estimators , 1980 .

[16]  Magnus Rattray,et al.  PCA learning for sparse high-dimensional data , 2003 .

[17]  K. Wachter The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements , 1978 .

[18]  D. Saad,et al.  FINITE-SIZE EFFECTS AND OPTIMAL TEST SET SIZE IN LINEAR PERCEPTRONS , 1995 .

[19]  W. H. Young On the Multiplication of Successions of Fourier Constants , 1912 .

[20]  Isabelle Guyon,et al.  Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark , 2007, Pattern Recognit. Lett..

[21]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[22]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[23]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[25]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[26]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[27]  Magnus Rattray,et al.  A Statistical Mechanics Analysis of Gram Matrix Eigenvalue Spectra , 2004, COLT.

[28]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[29]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[30]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[31]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[32]  David C. Hoyle,et al.  Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes , 2008 .

[33]  Anirvan M. Sengupta,et al.  Distributions of singular values for some random matrices. , 1997, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[34]  M. Rattray,et al.  Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Magnus Rattray,et al.  Making sense of microarray data distributions , 2002, Bioinform..

[36]  Lars Kai Hansen,et al.  Stochastic linear learning: Exact test and training error averages , 1993, Neural Networks.

[37]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.