Elementary Estimators for Sparse Covariance Matrices and other Structured Moments

We consider the problem of estimating expectations of vector-valued feature functions; a special case of which includes estimating the covariance matrix of a random vector. We are interested in recovery under high-dimensional settings, where the number of features p is potentially larger than the number of samples n, and where we need to impose structural constraints. In a natural distributional setting for this problem, the feature functions comprise the sufficient statistics of an exponential family, so that the problem would entail estimating structured moments of exponential family distributions. For instance, in the special case of covariance estimation, the natural distributional setting would correspond to the multivariate Gaussian distribution. Unlike the inverse covariance estimation case, we show that the regularized MLEs for covariance estimation, as well as natural Dantzig variants, are non-convex, even when the regularization functions themselves are convex; with the same holding for the general structured moment case. We propose a class of elementary convex estimators, that in many cases are available in closed-form, for estimating general structured moments. We then provide a unified statistical analysis of our class of estimators. Finally, we demonstrate the applicability of our class of estimators via simulation and on real-world climatology and biology datasets.

[1]  N. Cressie,et al.  A dimension-reduced approach to space-time Kalman filtering , 1999 .

[2]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[3]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[4]  Noureddine El Karoui,et al.  Operator norm consistent estimation of large-dimensional sparse covariance matrices , 2008, 0901.3220.

[5]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[6]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[7]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[8]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[9]  T. Bengtsson,et al.  Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants , 2007 .

[10]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[11]  Pradeep Ravikumar,et al.  Dirty Statistical Models , 2013, NIPS.

[12]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[13]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[14]  G. Wahba,et al.  Multivariate Bernoulli distribution , 2012, 1206.1874.

[15]  Ming Hu,et al.  HiCNorm: removing biases in Hi-C data via Poisson regression , 2012, Bioinform..

[16]  L. Mirny,et al.  Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data , 2013, Nature Reviews Genetics.

[17]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[18]  J. Tropp,et al.  Two proposals for robust PCA using semidefinite programming , 2010, 1012.1086.

[19]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[20]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[21]  Adam J. Rothman,et al.  Generalized Thresholding of Large Covariance Matrices , 2009 .

[22]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[23]  Sham M. Kakade,et al.  Robust Matrix Decomposition With Sparse Corruptions , 2011, IEEE Transactions on Information Theory.

[24]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[25]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[26]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[27]  P. L. Combettes,et al.  A proximal decomposition method for solving convex variational inverse problems , 2008, 0807.2617.

[28]  Jean-Marc Azaïs,et al.  Adaptation of the optimal fingerprint method for climate change detection using a well-conditioned covariance matrix estimate , 2009 .

[29]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[30]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.