论文信息 - An Efficient Projection for l 1 , ∞ Regularization

An Efficient Projection for l 1 , ∞ Regularization

In recent years the l1,∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l1,∞ regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l1,∞ ball. We present an algorithm that works in O(n log n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that l1,∞ leads to better performance than both l2 and l1 regularization and that it is is effective in discovering jointly sparse solutions.

Trevor Darrell | A. Quattoni

[1] B. Belkhouche,et al. Acknowledgements We Would like to Thank , 1993 .

[2] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[3] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[4] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[5] Dan Roth,et al. Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[6] Stephen J. Wright,et al. Simultaneous Variable Selection , 2005, Technometrics.

[7] D. Donoho. For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[8] J. Tropp. Algorithms for simultaneous sparse approximation. Part II: Convex relaxation , 2006, Signal Process..

[9] Michael I. Jordan,et al. Multi-task feature selection , 2006 .

[10] Mee Young Park,et al. Regularization Path Algorithms for Detecting Gene Interactions , 2006 .

[11] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[13] Massimiliano Pontil,et al. Multi-Task Feature Learning , 2006, NIPS.

[14] Daphne Koller,et al. Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[15] Timo Similä,et al. Input selection and shrinkage in multiresponse linear regression , 2007, Comput. Stat. Data Anal..

[16] Trevor Darrell,et al. The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[17] Mark W. Schmidt,et al. Structure learning in random fields for heart motion abnormality detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Trevor Darrell,et al. Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[20] P. Bühlmann,et al. The group lasso for logistic regression , 2008 .

[21] Mark W. Schmidt,et al. Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[22] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..