Multi-stage multi-task feature learning

Multi-task sparse feature learning aims to improve the generalization performance by exploiting the shared features among tasks. It has been successfully applied to many applications including computer vision and biomedical informatics. Most of the existing multi-task sparse feature learning algorithms are formulated as a convex sparse regularization problem, which is usually suboptimal, due to its looseness for approximating an [Formula: see text]-type regularizer. In this paper, we propose a non-convex formulation for multi-task sparse feature learning based on a novel regularizer. To solve the non-convex optimization problem, we propose a Multi-Stage Multi-Task Feature Learning (MSMTFL) algorithm. Moreover, we present a detailed theoretical analysis showing that MSMTFL achieves a better parameter estimation error bound than the convex formulation. Empirical studies on both synthetic and real-world data sets demonstrate the effectiveness of MSMTFL in comparison with the state of the art multi-task sparse feature learning algorithms.

[1]  M. Wainwright,et al.  Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ 1,∞ -regularization , 2008, NIPS 2008.

[2]  Murat Dundar,et al.  An Improved Multi-task Learning Approach with Applications in Medical Diagnosis , 2008, ECML/PKDD.

[3]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[4]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[7]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[8]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[9]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[10]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[11]  Jieping Ye,et al.  A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems , 2013, ICML.

[12]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[13]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[14]  Alexander J. Smola,et al.  Multitask Learning without Label Correspondences , 2010, NIPS.

[15]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[16]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[17]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[18]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[19]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[20]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[21]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[22]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[23]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[24]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[25]  Dit-Yan Yeung,et al.  Multi-Task Learning using Generalized t Process , 2010, AISTATS.

[26]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[27]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[28]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[29]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[30]  H. Zou,et al.  STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION. , 2012, Annals of statistics.

[31]  Tong Zhang Multi-stage Convex Relaxation for Feature Selection , 2011, 1106.0565.

[32]  Eric P. Xing,et al.  Heterogeneous multitask learning with joint sparsity constraints , 2009, NIPS.

[33]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[34]  Larry A. Wasserman,et al.  Union Support Recovery in Multi-task Learning , 2010, J. Mach. Learn. Res..

[35]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[36]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[37]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[38]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[39]  J. Toland A duality principle for non-convex optimisation and the calculus of variations , 1979 .

[40]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[41]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[42]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[43]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[44]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.