High-dimensional graphs and variable selection with the Lasso

The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.

[1]  T. Speed,et al.  Gaussian Markov Distributions over Finite Graphs , 1986 .

[2]  T. Hastie,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Discussion , 1993 .

[3]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[4]  Søren Ladegaard Buhl On the Existence of Maximum Likelihood Estimators for Graphical Gaussian Models , 1993 .

[5]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[6]  D. Edwards Introduction to graphical modelling , 1995 .

[7]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[10]  Michael I. Jordan Graphical Models , 1998 .

[11]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[12]  A. Juditsky,et al.  Functional aggregation for nonparametric regression , 2000 .

[13]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[14]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[15]  D. Heckerman,et al.  Dependency networks for inference , 2000 .

[16]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[17]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[18]  Y. Ritov,et al.  Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[19]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .

[20]  M. Drton,et al.  Model selection for Gaussian concentration graphs , 2004 .