论文信息 - Multi-label learning under feature extraction budgets

Multi-label learning under feature extraction budgets

We consider the problem of learning sparse linear models for multi-label prediction tasks under a hard constraint on the number of features. Such budget constraints are important in domains where the acquisition of the feature values is costly. We propose a greedy multi-label regularized least-squares algorithm that solves this problem by combining greedy forward selection search with a cross-validation based selection criterion in order to choose, which features to include in the model. We present a highly efficient algorithm for implementing this procedure with linear time and space complexities. This is achieved through the use of matrix update formulas for speeding up feature addition and cross-validation computations. Experimentally, we demonstrate that the approach allows finding sparse accurate predictors on a wide range of benchmark problems, typically outperforming the multi-task lasso baseline method when the budget is small.

[1] Han Liu,et al. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[2] P. Lachenbruch. An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. , 1967, Biometrics.

[3] Saso Dzeroski,et al. An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[4] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5] Grigorios Tsoumakas,et al. Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[6] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[7] Amanda Clare,et al. Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[8] Michel Verleysen,et al. Feature Selection for Multi-label Classification Problems , 2011, IWANN.

[9] Tapio Salakoski,et al. Speeding Up Greedy Forward Selection for Regularized Least-Squares , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[10] Ben Taskar,et al. Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[11] Grigorios Tsoumakas,et al. Random K-labelsets for Multilabel Classification , 2022 .

[12] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[13] Elad Hazan,et al. Linear Regression with Limited Observation , 2012, ICML.

[14] M. Pontil. Leave-one-out error and stability of learning algorithms with applications , 2002 .

[15] Johan A. K. Suykens,et al. Least Squares Support Vector Machines , 2002 .

[16] Ohad Shamir,et al. Efficient Learning with Partially Observed Attributes , 2010, ICML.

[17] Grigorios Tsoumakas,et al. MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[18] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[19] Tong Zhang,et al. Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.

[20] Eyke Hüllermeier,et al. Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[21] Charles X. Ling,et al. Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22] J. Hanley,et al. The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[23] Kilian Q. Weinberger,et al. The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[24] Stephen J. Wright,et al. Simultaneous Variable Selection , 2005, Technometrics.

[25] Grigorios Tsoumakas,et al. Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[26] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[27] Johan A. K. Suykens,et al. Least Squares Support Vector Machines , 2002 .

[28] Qian Xu,et al. Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[29] Tapio Salakoski,et al. Greedy Regularized Least-Squares for Multi-task Learning , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[30] Tapio Salakoski,et al. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations , 2012, Algorithms for Molecular Biology.

[31] Víctor Robles,et al. Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[32] Zhi-Hua Zhou,et al. ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[33] Naoki Abe,et al. Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction , 2009, NIPS.

[34] S. R. Searle,et al. On Deriving the Inverse of a Sum of Matrices , 1981 .

[35] Barbara J. Bulmahn. Ridge regression : biased estimation based on ill-conditioned data , 1979 .

[36] Peng Zhao,et al. On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[37] Tapio Salakoski,et al. Learning Multi-Label Predictors under Sparsity Budget , 2011, SCAI.