An Inequality with Applications to Structured Sparsity and Multitask Dictionary Learning

From concentration inequalities for the suprema of Gaussian or Rademacher processes an inequality is derived. It is applied to sharpen existing and to derive novel bounds on the empirical Rademacher complexities of unit balls in various norms appearing in the context of structured sparsity and multitask dictionary learning or matrix factorization. A key role is played by the largest eigenvalue of the data covariance matrix.

[1]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[2]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[3]  Gábor Lugosi,et al.  Concentration Inequalities , 2008, COLT.

[4]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[5]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[6]  Colin Campbell,et al.  Generalization Bounds for Learning the Kernel Problem , 2009, COLT.

[7]  Charles A. Micchelli,et al.  Regularizers for structured sparsity , 2010, Advances in Computational Mathematics.

[8]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[9]  M. Ledoux The concentration of measure phenomenon , 2001 .

[10]  Andreas Maurer,et al.  Concentration inequalities for functions of independent variables , 2006, Random Struct. Algorithms.

[11]  Andreas Maurer Concentration inequalities for functions of independent variables , 2006 .

[12]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[13]  Massimiliano Pontil,et al.  Excess risk bounds for multitask learning with trace norm regularization , 2012, COLT.

[14]  J. Kuelbs Probability on Banach spaces , 1978 .

[15]  P. MassartLedoux,et al.  Concentration Inequalities Using the Entropy Method , 2002 .

[16]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[17]  M. Wainwright,et al.  Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization , 2008, NIPS 2008.

[18]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[19]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[20]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[21]  Nathan Srebro,et al.  Sparse Prediction with the $k$-Support Norm , 2012, NIPS.

[22]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[23]  Andreas Maurer,et al.  Thermodynamics and Concentration , 2012, 1205.1595.

[24]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[25]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[26]  Massimiliano Pontil,et al.  Structured Sparsity and Generalization , 2011, J. Mach. Learn. Res..