Online Learning for Group Lasso

We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batch-mode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs in an online mode and scales well: at each iteration one can update the weight vector according to a closed-form solution based on the average of previous subgradients. Therefore, the proposed online algorithm can be very efficient and scalable. This is guaranteed by its low worst-case time complexity and memory cost both in the order of O(d), where d is the number of dimensions. Moreover, in order to achieve more sparsity in both the group level and the individual feature level, we successively extend our online system to efficiently solve a number of variants of sparse group lasso models. We also show that the online system is applicable to other group lasso models, such as the group lasso with overlap and graph lasso. Finally, we demonstrate the merits of our algorithm by experimenting with both synthetic and real-world datasets.

[1]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[2]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[3]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[4]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[5]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[6]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[7]  Yoram Singer,et al.  Online multiclass learning by interclass hypothesis sharing , 2006, ICML.

[8]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[9]  Yoram Singer,et al.  Efficient Learning using Forward-Backward Splitting , 2009, NIPS.

[10]  James T. Kwok,et al.  Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[11]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[12]  Christopher B. Burge,et al.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals , 2003, RECOMB '03.

[13]  Trevor Darrell,et al.  An efficient projection for l1, ∞ regularization , 2009, ICML '09.

[14]  Liva Ralaivola,et al.  Multiple indefinite kernel learning with mixed norm regularization , 2009, ICML '09.

[15]  Yoram Singer,et al.  Online Learning Meets Optimization in the Dual , 2006, COLT.

[16]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[17]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[18]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[19]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Yoram Singer,et al.  Online Classification for Complex Problems Using Simultaneous Projections , 2006, NIPS.

[22]  David Madigan,et al.  Algorithms for Sparse Linear Classifiers in the Massive Data Setting , 2008 .

[23]  Trevor Darrell,et al.  An efficient projection for {\it l}$_{\mbox{1}}$,$_{\mbox{infinity}}$ regularization , 2009, International Conference on Machine Learning.

[24]  Rong Jin,et al.  DUOL: A Double Updating Approach for Online Learning , 2009, NIPS.

[25]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[26]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[27]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[28]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[29]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.