Discovering Sparse Covariance Structures With the Isomap

Regularization of covariance matrices in high dimensions usually either is based on a known ordering of variables or ignores the ordering entirely. This article proposes a method for discovering meaningful orderings of variables based on their correlations using the Isomap, a nonlinear dimension reduction technique designed for manifold embeddings. These orderings are then used to construct a sparse covariance estimator, which is block-diagonal and/or banded. Finding an ordering to which banding can be applied is desirable because banded estimators have been shown to be consistent in high dimensions. We show that in situations where the variables do have such a structure, the Isomap does very well at discovering it, and the resulting regularized estimator performs better for covariance estimation than other regularization methods that ignore variable order, such as thresholding. We also propose a bootstrap approach to constructing the neighborhood graph used by the Isomap, and show it leads to better estimation. We illustrate our method on data on protein consumption, where the variables (food types) have a structure but it cannot be easily described a priori, and on a gene expression dataset. Supplementary materials are available online.

[1]  A. Wirth Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[2]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[3]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[4]  Noureddine El Karoui,et al.  Operator norm consistent estimation of large-dimensional sparse covariance matrices , 2008, 0901.3220.

[5]  Adam J. Rothman,et al.  Sparse estimation of large covariance matrices via a nested Lasso penalty , 2008, 0803.3872.

[6]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[7]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[8]  M. Newman,et al.  Robustness of community structure in networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  L. Ghaoui,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[10]  A. Wagaman Topics in High-Dimensional Inference with Applications to Raman Spectroscopy. , 2008 .

[11]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[12]  T. Bengtsson,et al.  Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants , 2007 .

[13]  Jianqing Fan,et al.  High dimensional covariance matrix estimation using a factor model , 2007, math/0701124.

[14]  Noureddine El Karoui,et al.  Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices , 2005, math/0503109.

[15]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[16]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[17]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[18]  Kilian Q. Weinberger,et al.  Spectral Methods for Dimensionality Reduction , 2006, Semi-Supervised Learning.

[19]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[20]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[21]  Miguel Á. Carreira-Perpiñán,et al.  Proximity Graphs for Clustering and Manifold Learning , 2004, NIPS.

[22]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[23]  M. Pourahmadi,et al.  Nonparametric estimation of large covariance matrices of longitudinal data , 2003 .

[24]  Nicole Immorlica,et al.  Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques , 2003, Lecture Notes in Computer Science.

[25]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[27]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[28]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[29]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[30]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[31]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[32]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[33]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  R. Tibshirani,et al.  An Introduction to the Bootstrap , 1995 .

[35]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[36]  D. Dey,et al.  Estimation of a covariance matrix under Stein's loss , 1985 .

[37]  K Ruben Gabriel,et al.  Biplot Display of Multivariate Matrices for Inspection of Data and Diagnosis. , 1980 .

[38]  L. R. Haff Empirical Bayes Estimation of the Multivariate Normal Covariance Matrix , 1980 .

[39]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .

[40]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .