Learning Local Dependence In Ordered Data

In many applications, data come with a natural ordering. This ordering can often induce local dependence among nearby variables. However, in complex data, the width of this dependence may vary, making simple assumptions such as a constant neighborhood size unrealistic. We propose a framework for learning this local dependence based on estimating the inverse of the Cholesky factor of the covariance matrix. Penalized maximum likelihood estimation of this matrix yields a simple regression interpretation for local dependence in which variables are predicted by their neighbors. Our proposed method involves solving a convex, penalized Gaussian likelihood problem with a hierarchical group lasso penalty. The problem decomposes into independent subproblems which can be solved efficiently in parallel using first-order methods. Our method yields a sparse, symmetric, positive definite estimator of the precision matrix, encoding a Gaussian graphical model. We derive theoretical results not found in existing methods attaining this structure. In particular, our conditions for signed support recovery and estimation consistency rates in multiple norms are as mild as those in a regression problem. Empirical results show our method performing favorably compared to existing methods. We apply our method to genomic data to flexibly model linkage disequilibrium. Our method is also applied to improve the performance of discriminant analysis in sound recording classification.

[1]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[2]  J. Bien,et al.  Hierarchical Sparse Modeling: A Choice of Two Group Lasso Formulations , 2015, 1512.01631.

[3]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[4]  M. Pourahmadi Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[5]  Adam J. Rothman,et al.  Sparse estimation of large covariance matrices via a nested Lasso penalty , 2008, 0803.3872.

[6]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[7]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[8]  Cun-Hui Zhang,et al.  Comments on: ℓ1-penalization for mixture regression models , 2010 .

[9]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[10]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[11]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[12]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[13]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[14]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[15]  Mohsen Pourahmadi,et al.  High-Dimensional Covariance Estimation , 2013 .

[16]  K. Khare,et al.  A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees , 2013, 1307.5381.

[17]  Qing Zhou,et al.  Concave penalized estimation of sparse Gaussian Bayesian networks , 2014, J. Mach. Learn. Res..

[18]  H. Zou,et al.  Sparse precision matrix estimation via lasso penalized D-trace loss , 2014 .

[19]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[20]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[21]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[22]  Xiaohan Yan,et al.  Hierarchical Sparse Modeling: A Choice of Two Regularizers , 2015 .

[23]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[24]  A convex framework for high-dimensional sparse Cholesky based covariance estimation , 2016, 1610.02436.

[25]  D. Zimmerman,et al.  Antedependence Models for Longitudinal Data , 2009 .

[26]  Han Liu,et al.  TIGER : A tuning-insensitive approach for optimally estimating Gaussian graphical models , 2017 .

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[29]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[30]  Weidong Liu,et al.  High-dimensional Sparse Precision Matrix Estimation via Sparse Column Inverse Operator ∗ , 2012 .

[31]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[32]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[33]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[34]  E. Barrio Comments on: l1-penalization for mixture regression models , 2010 .

[35]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[36]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[37]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[38]  M. Pourahmadi,et al.  Nonparametric estimation of large covariance matrices of longitudinal data , 2003 .

[39]  Luo Xiao,et al.  Convex Banding of the Covariance Matrix , 2016, Journal of the American Statistical Association.

[40]  Cun-Hui Zhang,et al.  Sparse matrix inversion with scaled Lasso , 2012, J. Mach. Learn. Res..

[41]  Ming Yuan,et al.  High Dimensional Inverse Covariance Matrix Estimation via Linear Programming , 2010, J. Mach. Learn. Res..

[42]  Jianhua Z. Huang,et al.  Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[43]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[44]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[45]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.