Taking Advantage of Sparsity in Multi-Task Learning

We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multi-task learning Argyriou et al. [2008], we assume that the regression vectors share the same sparsity pattern. This means that the set of relevant predictor variables is the same across the different equations. This assumption leads us to consider the Group Lasso as a candidate estimation method. We show that this estimator enjoys nice sparsity oracle inequalities and variable selection properties. The results hold under a certain restricted eigenvalue condition and a coherence condition on the design matrix, which naturally extend recent work in Bickel et al. [2007], Lounici [2008]. In particular, in the multi-task learning scenario, in which the number of tasks can grow, we are able to remove completely the effect of the number of predictor variables in the bounds. Finally, we show how our results can be extended to more general noise distributions, of which we only require the variance to be finite.

[1]  A. Rinaldo,et al.  On the asymptotic properties of the group lasso estimator for linear models , 2008 .

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  Lutz Dümbgen,et al.  Nemirovski's Inequalities Revisited , 2008, Am. Math. Mon..

[4]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[5]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[6]  M. Pontil,et al.  A Convex Optimization Approach to Modeling Consumer Heterogeneity in Conjoint Estimation , 2007 .

[7]  Andreas Maurer,et al.  Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..

[8]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[9]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[10]  Cheng Hsiao,et al.  Analysis of Panel Data , 1987 .

[11]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[12]  C. Chesneau,et al.  Some theoretical results on the Grouped Variables Lasso , 2008 .

[13]  Ming Yuan,et al.  Sparse Recovery in Large Ensembles of Kernel Machines On-Line Learning and Bandits , 2008, COLT.

[14]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[15]  Claudio Gentile,et al.  Linear Algorithms for Online Multitask Classification , 2010, COLT.

[16]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[17]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[18]  P. Diggle Analysis of Longitudinal Data , 1995 .

[19]  P. Lenk,et al.  Hierarchical Bayes Conjoint Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs , 1996 .

[20]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[21]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[22]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[23]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[24]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[25]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[26]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[27]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[28]  D. L. Wallace Bounds on Normal Approximations to Student's and the Chi-Square Distributions , 1959 .

[29]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.