Simultaneous regression shrinkage , variable selection and clustering of predictors with OSCAR

In this paper, a new method called the OSCAR (Octagonal Shrinkage and Clustering Algorithm for Regression) is proposed to simultaneously select variables and perform supervised clustering in the context of linear regression. The technique is based on penalized least squares with a geometrically intuitive penalty function that, like the LASSO penalty, shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form clusters represented by a single coefficient. These resulting clusters can then be investigated further to discover what contributes to the group having a similar behavior. The OSCAR then enjoys sparseness in terms of the number of unique coefficients in the model. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and reduced model complexity.