Discussion of “Correlated variables in regression: Clustering and sparse estimation”

Y = Xβ + ε. Here Y is the response vector in Rn, X is an n × p matrix, β0 ∈ Rp is the vector of coefficients, and finally ε ∈ Rn is assumed to be multivariate normal with mean zero and covariance matrix σ2I. While it has been shown that the lasso, and its many variants, “work” in terms of variable selection and prediction, they work best for near orthogonal cases of X. However, if p > n, correlation among the covariates is obviously inevitable. It is worth pointing out that the fit of the lasso estimator β̂ always satisfies