Localized Lasso for High-Dimensional Regression

We introduce the localized Lasso, which is suited for learning models that are both interpretable and have a high predictive power in problems with high dimensionality $d$ and small sample size $n$. More specifically, we consider a function defined by local sparse models, one at each data point. We introduce sample-wise network regularization to borrow strength across the models, and sample-wise exclusive group sparsity (a.k.a., $\ell_{1,2}$ norm) to introduce diversity into the choice of feature sets in the local models. The local models are interpretable in terms of similarity of their sparsity patterns. The cost function is convex, and thus has a globally optimal solution. Moreover, we propose a simple yet efficient iterative least-squares based optimization procedure for the localized Lasso, which does not need a tuning parameter, and is guaranteed to converge to a globally optimal solution. The solution is empirically shown to outperform alternatives for both simulated and genomic personalized medicine data.

[1]  C. Bachoc,et al.  Applied and Computational Harmonic Analysis Tight P-fusion Frames , 2022 .

[2]  Rong Jin,et al.  Exclusive Lasso for Multi-task Feature Selection , 2010, AISTATS.

[3]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[4]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[5]  Samuel Kaski,et al.  Bayesian Multi-view Tensor Factorization , 2014, ECML/PKDD.

[6]  Wittawat Jitkrittum,et al.  Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM) , 2015, NIPS.

[7]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[8]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[9]  L. Hubert,et al.  Comparing partitions , 1985 .

[10]  Alexandros Kalousis,et al.  Parametric Local Metric Learning for Nearest Neighbor Classification , 2012, NIPS.

[11]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  Rok Sosic,et al.  SnapVX: A Network-Based Convex Optimization Solver , 2017, J. Mach. Learn. Res..

[14]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[15]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[16]  Wen Gao,et al.  Efficient Generalized Fused Lasso and its Application to the Diagnosis of Alzheimer's Disease , 2014, AAAI.

[17]  Huan Liu,et al.  Feature Selection with Linked Data in Social Media , 2012, SDM.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[20]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[21]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[22]  Feiping Nie,et al.  Exclusive Feature Learning on Arbitrary Structures via \ell_{1, 2}-norm , 2014, NIPS.

[23]  Xiaoning Qian,et al.  A Scalable Algorithm for Structured Kernel Feature Selection , 2015, AISTATS.

[24]  Masashi Sugiyama,et al.  Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation , 2009, J. Mach. Learn. Res..

[25]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[26]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[27]  J. Suykens,et al.  Convex Clustering Shrinkage , 2005 .

[28]  Koh Takeuchi,et al.  Higher Order Fused Regularization for Supervised Learning with Grouped Parameters , 2015, ECML/PKDD.

[29]  Wei Sun,et al.  Sparse Convex Clustering , 2016, ArXiv.

[30]  M. Kowalski Sparse regression using mixed norms , 2009 .

[31]  Jiayu Zhou,et al.  FORMULA: FactORized MUlti-task LeArning for task discovery in personalized medical models , 2015, SDM.

[32]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[33]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[34]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[35]  Yoshinobu Kawahara,et al.  Multi-Task Feature Selection on Multiple Networks via Maximum Flows , 2014, SDM.

[36]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.