Multi-label learning under feature extraction budgets

We consider the problem of learning sparse linear models for multi-label prediction tasks under a hard constraint on the number of features. Such budget constraints are important in domains where the acquisition of the feature values is costly. We propose a greedy multi-label regularized least-squares algorithm that solves this problem by combining greedy forward selection search with a cross-validation based selection criterion in order to choose, which features to include in the model. We present a highly efficient algorithm for implementing this procedure with linear time and space complexities. This is achieved through the use of matrix update formulas for speeding up feature addition and cross-validation computations. Experimentally, we demonstrate that the approach allows finding sparse accurate predictors on a wide range of benchmark problems, typically outperforming the multi-task lasso baseline method when the budget is small.

[1]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[2]  P. Lachenbruch An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. , 1967, Biometrics.

[3]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[6]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[7]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[8]  Michel Verleysen,et al.  Feature Selection for Multi-label Classification Problems , 2011, IWANN.

[9]  Tapio Salakoski,et al.  Speeding Up Greedy Forward Selection for Regularized Least-Squares , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[10]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[11]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  Elad Hazan,et al.  Linear Regression with Limited Observation , 2012, ICML.

[14]  M. Pontil Leave-one-out error and stability of learning algorithms with applications , 2002 .

[15]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[16]  Ohad Shamir,et al.  Efficient Learning with Partially Observed Attributes , 2010, ICML.

[17]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[18]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[19]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.

[20]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[21]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[23]  Kilian Q. Weinberger,et al.  The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[24]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[25]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[26]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[27]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[28]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[29]  Tapio Salakoski,et al.  Greedy Regularized Least-Squares for Multi-task Learning , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[30]  Tapio Salakoski,et al.  Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations , 2012, Algorithms for Molecular Biology.

[31]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[32]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[33]  Naoki Abe,et al.  Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction , 2009, NIPS.

[34]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[35]  Barbara J. Bulmahn Ridge regression : biased estimation based on ill-conditioned data , 1979 .

[36]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[37]  Tapio Salakoski,et al.  Learning Multi-Label Predictors under Sparsity Budget , 2011, SCAI.