Greedy Regularized Least-Squares for Multi-task Learning

Multi-task feature selection refers to the problem of selecting a common predictive set of features over multiple related learning tasks. The problem is encountered for example in applications, where one can afford only a limited set of feature extractors for solving several tasks. In this work, we present a regularized least-squares (RLS) based algorithm for multi-task greedy forward feature selection. The method selects features jointly for all the tasks by using leave-one-out cross-validation error averaged over the tasks as the selection criterion. While a straightforward implementation of the approach by combining a wrapper algorithm with a black-box RLS training method would have impractical computational costs, we achieve linear time complexity for the training algorithm through the use of matrix algebra based computational shortcuts. In our experiments on insurance and speech classification data sets the proposed method shows a better prediction performance than baseline methods that select the same number of features independently.

[1]  P. Lachenbruch An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. , 1967, Biometrics.

[2]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[3]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[4]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[5]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[6]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[7]  M. Pontil Leave-one-out error and stability of learning algorithms with applications , 2002 .

[8]  Tapio Salakoski,et al.  Learning Multi-Label Predictors under Sparsity Budget , 2011, SCAI.

[9]  Jinbo Bi,et al.  Probabilistic Joint Feature Selection for Multi-task Learning , 2007, SDM.

[10]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[11]  Tapio Salakoski,et al.  Speeding Up Greedy Forward Selection for Regularized Least-Squares , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[12]  Rong Jin,et al.  Exclusive Lasso for Multi-task Feature Selection , 2010, AISTATS.

[13]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[14]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[15]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[16]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[17]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[18]  Tony Jebara,et al.  Multitask Sparsity via Maximum Entropy Discrimination , 2011, J. Mach. Learn. Res..

[19]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[20]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[21]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..