Distributed Lasso for in-network linear regression

The least-absolute shrinkage and selection operator (Lasso) is a popular tool for joint estimation and continuous variable selection, especially well-suited for the under-determined but sparse linear regression problems. This paper develops an algorithm to estimate the regression coefficients via Lasso when the training data is distributed across different agents, and their communication to a central processing unit is prohibited for e.g., communication cost or privacy reasons. The novel distributed algorithm is obtained after reformulating the Lasso into a separable form, which is iteratively minimized using the alternating-direction method of multipliers so as to gain the desired degree of parallelization. The per agent estimate updates are given by simple soft-thresholding operations, and inter-agent communication overhead remains at affordable level. Without exchanging elements from the different training sets, the local estimates provably consent to the global Lasso solution, i.e., the fit that would be obtained if the entire data set were centrally available. Numerical experiments corroborate the convergence and global optimality of the proposed distributed scheme.