Multi-stage multi-task feature learning via adaptive threshold

Multi-task feature learning aims to identify the shared features among tasks to improve generalization. Recent works have shown that the non-convex learning model often returns a better solution than the convex alternatives. Thus a non-convex model based on the capped-1, 1 regularization was proposed in [1], and the corresponding efficient multi-stage multi-task feature learning algorithm (MSMTFL) was presented. However, this method harnesses a fixed threshold in the capped-1, 1 regularization. The lack of adaptivity might result in suboptimal practical performance. In this paper we propose to employ an adaptive threshold in the capped-1, 1 regularized formulation, and the corresponding variant of MSMTFL will incorporate an additional scheme to adaptively determine the threshold. Considering that this threshold aims to distinguish true nonzero components of large magnitude from others, the heuristic of detecting the "first significant jump" proposed in [2] is applied here to adaptively determine its value. The preliminary theoretical analysis is provided to guarantee the feasibility of the proposed method. Several numerical experiments demonstrate the proposed method outperforms existing state-of-the-art feature learning approaches.

[1]  Murat Dundar,et al.  An Improved Multi-task Learning Approach with Applications in Medical Diagnosis , 2008, ECML/PKDD.

[2]  Jiayu Zhou,et al.  Efficient multi-task feature learning with calibration , 2014, KDD.

[3]  Yilun Wang,et al.  Randomized structural sparsity via constrained block subsampling for improved sensitivity of discriminative voxel identification , 2014, NeuroImage.

[4]  Jieping Ye,et al.  Learning incoherent sparse and low-rank patterns from multiple tasks , 2010 .

[5]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[6]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[7]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[8]  Tong Zhang Multi-stage Convex Relaxation for Feature Selection , 2011, 1106.0565.

[9]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[10]  Jieping Ye,et al.  Multi-stage multi-task feature learning , 2012, J. Mach. Learn. Res..

[11]  M. Kowalski Sparse regression using mixed norms , 2009 .

[12]  Yoshua Bengio,et al.  Multi-Task Learning for Stock Selection , 1996, NIPS.

[13]  YinWotao,et al.  Sparse Signal Reconstruction via Iterative Support Detection , 2010 .

[14]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[15]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[16]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[17]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[18]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[20]  Wotao Yin,et al.  Sparse Signal Reconstruction via Iterative Support Detection , 2009, SIAM J. Imaging Sci..

[21]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[22]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[23]  Yilun Wang,et al.  Bare Advanced Demo of IEEEtran.cls for Computer Society Journals , 2015 .

[24]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[25]  Dit-Yan Yeung,et al.  Multi-Task Learning using Generalized t Process , 2010, AISTATS.

[26]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[27]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[28]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[29]  Ali Jalali,et al.  A Dirty Model for Multiple Sparse Regression , 2011, IEEE Transactions on Information Theory.