论文信息 - A Probabilistic Model for Dirty Multi-task Feature Selection - 字舞流文

A Probabilistic Model for Dirty Multi-task Feature Selection

Multi-task feature selection methods often make the hypothesis that learning tasks share relevant and irrelevant features. However, this hypothesis may be too restrictive in practice. For example, there may be a few tasks with specific relevant and irrelevant features (outlier tasks). Similarly, a few of the features may be relevant for only some of the tasks (outlier features). To account for this, we propose a model for multitask feature selection based on a robust prior distribution that introduces a set of binary latent variables to identify outlier tasks and outlier features. Expectation propagation can be used for efficient approximate inference under the proposed prior. Several experiments show that a model based on the new robust prior provides better predictive performance than other benchmark methods.

Daniel Hernández-Lobato | Zoubin Ghahramani | José Miguel Hernández-Lobato | Zoubin Ghahramani | D. Hernández-Lobato

[1] James G. Scott,et al. Handling Sparsity via the Horseshoe , 2009, AISTATS.

[2] W. Strawderman. Proper Bayes Minimax Estimators of the Multivariate Normal Mean , 1971 .

[3] Daniel Hernández-Lobato,et al. Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation , 2013, J. Mach. Learn. Res..

[4] Yiming Yang,et al. Flexible latent variable models for multi-task learning , 2008, Machine Learning.

[5] Massimiliano Pontil,et al. Multi-Task Feature Learning , 2006, NIPS.

[6] Tom Minka,et al. A family of algorithms for approximate Bayesian inference , 2001 .

[7] Julia E. Vogt,et al. The Group-Lasso : ` 1 , ∞ Regularization versus ` 1 , 2 Regularization , 2010 .

[8] Miguel Lázaro-Gredilla,et al. Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[9] Daniel Henrández-Lobato,et al. Learning feature selection dependencies in multi-task learning , 2013, NIPS 2013.

[10] Thibault Helleputte,et al. Expectation Propagation for Bayesian Multi-task Feature Selection , 2010, ECML/PKDD.

[11] Lawrence Carin,et al. Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[12] Ben Taskar,et al. Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[13] M. Seeger. Expectation Propagation for Exponential Families , 2005 .

[14] Daniel Hernández-Lobato,et al. Expectation propagation in linear regression models with spike-and-slab priors , 2015, Machine Learning.

[15] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[16] Adam A. Margolin,et al. The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[17] T. J. Mitchell,et al. Bayesian Variable Selection in Linear Regression , 1988 .

[18] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[19] Jacques Wainer,et al. Flexible Modeling of Latent Task Structures in Multitask Learning , 2012, ICML.

[20] Ali Jalali,et al. A Dirty Model for Multi-task Learning , 2010, NIPS.

[21] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[22] Jieping Ye,et al. Robust multi-task feature learning , 2012, KDD.

[23] J. Berger. A Robust Generalized Bayes Estimator and Confidence Region for a Multivariate Normal Mean , 1980 .

[24] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[25] Dario Floreano,et al. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[26] P. Geurts,et al. Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[27] Jinbo Bi,et al. Probabilistic Joint Feature Selection for Multi-task Learning , 2007, SDM.

[28] David B. Dunson,et al. Generalized Beta Mixtures of Gaussians , 2011, NIPS.

[29] Tony Jebara,et al. Multi-task feature and kernel selection for SVMs , 2004, ICML.

[30] Michael I. Jordan,et al. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[31] I. Johnstone,et al. Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.