Approximation of Positive Semidefinite Matrices Using the Nystrom Method

Positive semidefinite matrices arise in a variety of fields, including statistics, signal processing, and machine learning. Unfortunately, when these matrices are high-dimensional and/or must be operated upon many times, expensive calculations such as the spectral decomposition quickly become a computational bottleneck. A common alternative is to replace the original positive semidefinite matrices with low-rank approximations whose spectral decompositions can be more easily computed. In this thesis, we develop approaches based on the Nyström method, which approximates a positive semidefinite matrix using a data-dependent orthogonal projection. As the Nyström approximation is conditioned on a given principal submatrix of its argument, it essentially recasts low-rank approximation as a subset selection problem. We begin by deriving the Nyström approximation and developing a number of fundamental results, including new characterizations of its spectral properties and approximation error. We then address the problem of subset selection through a study of randomized sampling algorithms. We provide new bounds for the approximation error under uniformly random sampling, as well as bounds for two new data-dependent sampling methods. We continue by extending these results to random positive definite matrices, deriving statistics for the approximation error of matrices with Wishart and beta distributions, as well as for a broader class of orthogonally invariant and residual independent matrices. Once this theoretical foundation has been established, we turn to practical applications of Nyström methods. We explore new exact and approximate sampling methods for randomized subset selection, and develop greedy approaches for subset optimization. We conclude by developing the Nyström approximation as a low-rank covariance estimator that provides for computationally efficient spectral analysis while shrinking the eigenvalues of the sample covariance. After deriving expressions for its bias and mean squared error, we illustrate the effectiveness of the Nyström covariance estimator through empirical examples in adaptive beamforming and image denoising.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  J.S. Goldstein,et al.  Comparison of reduced-rank signal processing techniques , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[3]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[4]  T. Markham,et al.  Schur complements of diagonally dominant matrices , 1979 .

[5]  Ameet Talwalkar,et al.  Large-scale manifold learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Kristine L. Bell,et al.  A Bayesian approach to robust adaptive beamforming , 2000, IEEE Trans. Signal Process..

[7]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[8]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[9]  J. Bouchaud,et al.  Noise Dressing of Financial Correlation Matrices , 1998, cond-mat/9810255.

[10]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[11]  Thomas W. Parks,et al.  Adaptive principal components and image denoising , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[12]  L. Scharf,et al.  Statistical Signal Processing: Detection, Estimation, and Time Series Analysis , 1991 .

[13]  E. Haynsworth,et al.  An identity for the Schur complement of a matrix , 1969 .

[14]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[15]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[16]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[17]  S. Geman A Limit Theorem for the Norm of Random Matrices , 1980 .

[18]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[19]  H. Solomon,et al.  Distribution of a Sum of Weighted Chi-Square Variables , 1977 .

[20]  E. Nyström Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[21]  David J. C. Mackay,et al.  Introduction to Monte Carlo Methods , 1998, Learning in Graphical Models.

[22]  Lloyd J. Griffiths,et al.  A projection approach for robust adaptive beamforming , 1994, IEEE Trans. Signal Process..

[23]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[25]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[26]  J. Platt Fast embedding of sparse music similarity graphs , 2003, NIPS 2003.

[27]  W. Arnoldi The principle of minimized iterations in the solution of the matrix eigenvalue problem , 1951 .

[28]  P. Frankl,et al.  Some geometric applications of the beta distribution , 1990 .

[29]  Patrick J. Wolfe,et al.  Nyström approximation of Wishart matrices , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[31]  D. Dey,et al.  Estimation of a covariance matrix under Stein's loss , 1985 .

[32]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.

[33]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Patrick J. Wolfe,et al.  On landmark selection and sampling in high-dimensional data analysis , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[35]  J. Imhof Computing the distribution of quadratic forms in normal variables , 1961 .

[36]  L.E. Brennan,et al.  Theory of Adaptive Radar , 1973, IEEE Transactions on Aerospace and Electronic Systems.

[37]  D. Sorensen Numerical methods for large eigenvalue problems , 2002, Acta Numerica.

[38]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[39]  Kai Zhang,et al.  Density-Weighted Nyström Method for Computing Large Kernel Eigensystems , 2009, Neural Comput..

[40]  W. G. Cochran,et al.  The distribution of quadratic forms in a normal system, with applications to the analysis of covariance , 1934, Mathematical Proceedings of the Cambridge Philosophical Society.

[41]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[42]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[43]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[44]  M. S. Bartlett,et al.  The generalised product moment distribution in a normal system , 1933, Mathematical Proceedings of the Cambridge Philosophical Society.

[45]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[46]  J. Wishart THE GENERALISED PRODUCT MOMENT DISTRIBUTION IN SAMPLES FROM A NORMAL MULTIVARIATE POPULATION , 1928 .

[47]  Patrick J. Wolfe,et al.  Estimating principal components of large covariance matrices using the Nyström method , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  E. Wigner On the Distribution of the Roots of Certain Symmetric Matrices , 1958 .

[49]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[50]  Hao Zhang,et al.  Sub-sampling for Efficient Spectral Mesh Processing , 2006, Computer Graphics International.

[51]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[52]  Clifford S. Stein Estimation of a covariance matrix , 1975 .

[53]  L. R. Haff Empirical Bayes Estimation of the Multivariate Normal Covariance Matrix , 1980 .

[54]  P. Wolfe,et al.  Adaptive beamforming using fast low-rank covariance matrix approximations , 2008, 2008 IEEE Radar Conference.

[55]  Zhidong Bai,et al.  Sample Covariance Matrices and the Marčenko-Pastur Law , 2010 .

[56]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[57]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  Yoshihiko Konno EXACT MOMENTS OF THE MULTIVARIATE F AND BETA DISTRIBUTIONS , 1988 .

[60]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[61]  Pui Lam Leung,et al.  Estimation of Parameter Matrices and Eigenvalues in MANOVA and Canonical Correlation Analysis , 1987 .

[62]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[63]  Yoshua Bengio,et al.  Greedy Spectral Embedding , 2005, AISTATS.

[64]  D. Donoho,et al.  Translation-Invariant De-Noising , 1995 .

[65]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[66]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[67]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[68]  P. Wolfe,et al.  A signal processing application of randomized low-rank approximations , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[69]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[70]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[71]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[72]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[73]  S. Goreinov,et al.  The maximum-volume concept in approximation by low-rank matrices , 2001 .

[74]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[75]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[76]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.