LASSO-Penalized Clusterwise Linear Regression Modeling With Local Least Angle Regression (L-LARS)

In clusterwise regression analysis, the goal is to predict a response variable based on a set of explanatory variables, each with cluster-specific effects. Nowadays, the number of candidates is typically large: whereas some of these variables might be useful, some others might contribute very little to the prediction. A well known method to perform variable selection is the LASSO, with calibration done by minimizing the Bayesian Information Criterion (BIC). However, current LASSO-penalized estimators have several disadvantages: only certain types of penalties are considered; the computations might involve approximate schemes and can be very time consuming, with overly complex tuning of the penalty term. This is usually due to the possibly large number of times the estimator must be evaluated for each plausible value of the tuning parameter(s). To ease such computation, we introduce an Expectation Maximization (EM) algorithm with closed-form updates working with a very general version of the LASSO penalty. Such an EM is based on an iterative scheme where the component specific LASSO regression coefficients are computed according to a coordinate descent update. Least Angle Regression is then used to perform covariate selection by evaluating the estimator only once. The advantages of our proposal, in terms of computation time reduction and accuracy of model estimation and selection, are shown by means of a simulation study and illustrated with a real data application.