Covariance selection and estimation via penalised normal likelihood

We propose a nonparametric method to identify parsimony and to produce a statistically efficient estimator of a large covariance matrix. We reparameterise a covariance matrix through the modified Cholesky decomposition of its inverse or the one-step-ahead predictive representation of the vector of responses and reduce the nonintuitive task of modelling covariance matrices to the familiar task of model selection and estimation for a sequence of regression models. The Cholesky factor containing these regression coefficients is likely to have many off-diagonal elements that are zero or close to zero. Penalised normal likelihoods in this situation with L1 and L2 penalties are shown to be closely related to Tibshirani’s (1996) LASSO approach and to ridge regression. Adding either penalty to the likelihood helps to produce more stable estimators by introducing shrinkage to the elements in the Cholesky factor, while, because of its singularity, the L1 penalty will set some elements to zero and produce interpretable models. An algorithm is developed to compute the estimator and select the tuning parameter. The proposed maximum penalised likelihood estimator is illustrated using simulation and a real dataset involving estimation of a 102 × 102 covariance matrix.

[1]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[2]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[3]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[4]  P. Diggle Analysis of Longitudinal Data , 1995 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[7]  P J Diggle,et al.  Nonparametric estimation of covariance structure in longitudinal data. , 1998, Biometrics.

[8]  M. Pourahmadi Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[9]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[10]  M. Pourahmadi Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix , 2000 .

[11]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[12]  Henrik Madsen,et al.  Calibration with absolute shrinkage , 2001 .

[13]  Robert J. Boik,et al.  Spectral models for covariance matrices , 2002 .

[14]  R. Kohn,et al.  Parsimonious Covariance Matrix Estimation for Longitudinal Data , 2002 .

[15]  Olivier Ledoit,et al.  Honey, I Shrunk the Sample Covariance Matrix , 2003 .

[16]  M. Pourahmadi,et al.  Nonparametric estimation of large covariance matrices of longitudinal data , 2003 .

[17]  R. Kohn,et al.  Efficient estimation of covariance selection models , 2003 .

[18]  C. Parvin An Introduction to Multivariate Statistical Analysis, 3rd ed. T.W. Anderson. Hoboken, NJ: John Wiley & Sons, 2003, 742 pp., $99.95, hardcover. ISBN 0-471-36091-0. , 2004 .

[19]  Tapabrata Maiti,et al.  Analysis of Longitudinal Data (2nd ed.) (Book) , 2004 .

[20]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[21]  Avishai Mandelbaum,et al.  Statistical Analysis of a Telephone Call Center , 2005 .

[22]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.