High-dimensional covariance decomposition into sparse Markov and independence models

Fitting high-dimensional data involves a delicate tradeoff between faithful representation and the use of sparse models. Too often, sparsity assumptions on the fitted model are too restrictive to provide a faithful representation of the observed data. In this paper, we present a novel framework incorporating sparsity in different domains.We decompose the observed covariance matrix into a sparse Gaussian Markov model (with a sparse precision matrix) and a sparse independence model (with a sparse covariance matrix). Our framework incorporates sparse covariance and sparse precision estimation as special cases and thus introduces a richer class of high-dimensional models. We characterize sufficient conditions for identifiability of the two models, \viz Markov and independence models. We propose an efficient decomposition method based on a modification of the popular $\ell_1$-penalized maximum-likelihood estimator ($\ell_1$-MLE). We establish that our estimator is consistent in both the domains, i.e., it successfully recovers the supports of both Markov and independence models, when the number of samples $n$ scales as $n = \Omega(d^2 \log p)$, where $p$ is the number of variables and $d$ is the maximum node degree in the Markov model. Our experiments validate these results and also demonstrate that our models have better inference accuracy under simple algorithms such as loopy belief propagation.

[1]  Pablo A. Parrilo,et al.  Latent variable graphical model selection via convex optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[2]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[3]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[4]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[5]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[6]  T. Richardson,et al.  On a Dualization of Graphical Gaussian Models: A Correction Note , 2003 .

[7]  Karthik Mohan ADMM Algorithm for Graphical Lasso with an ℓ∞ Element-wise Norm Constraint , 2013, ArXiv.

[8]  G. Kauermann On a dualization of graphical Gaussian models , 1996 .

[9]  N. Wermuth,et al.  Linear Dependencies Represented by Chain Graphs , 1993 .

[10]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[11]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[12]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[13]  Eric P. Xing,et al.  On Sparse Nonparametric Conditional Covariance Selection , 2010, ICML.

[14]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[15]  Dmitry M. Malioutov,et al.  Walk-Sums and Belief Propagation in Gaussian Graphical Models , 2006, J. Mach. Learn. Res..

[16]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[17]  Vincent Y. F. Tan,et al.  High-Dimensional Gaussian Graphical Model Selection: Tractable Graph Families , 2011, ArXiv.

[18]  Tong Zhang,et al.  On the Consistency of Feature Selection using Greedy Least Squares Regression , 2009, J. Mach. Learn. Res..

[19]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[20]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[21]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[22]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[23]  Venkat Chandrasekaran,et al.  Gaussian Multiresolution Models: Exploiting Sparse Markov and Covariance Structure , 2010, IEEE Transactions on Signal Processing.

[24]  Peter Bühlmann,et al.  Missing values: sparse inverse covariance estimation and an extension to sparse regression , 2009, Statistics and Computing.

[25]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[26]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[27]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[28]  L. Brown Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .

[29]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[30]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[31]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[32]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[33]  Michael I. Jordan Graphical Models , 1998 .

[34]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[35]  E. Levina,et al.  Discovering Sparse Covariance Structures With the Isomap , 2009 .

[36]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[37]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[38]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[39]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[40]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[41]  K. Mohan,et al.  ADMM Algorithm for Graphical Lasso with an $\ell_{\infty}$ Element-wise Norm Constraint. , 2013 .

[42]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[43]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[44]  Noureddine El Karoui,et al.  Operator norm consistent estimation of large-dimensional sparse covariance matrices , 2008, 0901.3220.

[45]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.