A Probabilistic Model for Dirty Multi-task Feature Selection

Multi-task feature selection methods often make the hypothesis that learning tasks share relevant and irrelevant features. However, this hypothesis may be too restrictive in practice. For example, there may be a few tasks with specific relevant and irrelevant features (outlier tasks). Similarly, a few of the features may be relevant for only some of the tasks (outlier features). To account for this, we propose a model for multitask feature selection based on a robust prior distribution that introduces a set of binary latent variables to identify outlier tasks and outlier features. Expectation propagation can be used for efficient approximate inference under the proposed prior. Several experiments show that a model based on the new robust prior provides better predictive performance than other benchmark methods.

[1]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[2]  W. Strawderman Proper Bayes Minimax Estimators of the Multivariate Normal Mean , 1971 .

[3]  Daniel Hernández-Lobato,et al.  Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation , 2013, J. Mach. Learn. Res..

[4]  Yiming Yang,et al.  Flexible latent variable models for multi-task learning , 2008, Machine Learning.

[5]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[6]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[7]  Julia E. Vogt,et al.  The Group-Lasso : ` 1 , ∞ Regularization versus ` 1 , 2 Regularization , 2010 .

[8]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[9]  Daniel Henrández-Lobato,et al.  Learning feature selection dependencies in multi-task learning , 2013, NIPS 2013.

[10]  Thibault Helleputte,et al.  Expectation Propagation for Bayesian Multi-task Feature Selection , 2010, ECML/PKDD.

[11]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[12]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[13]  M. Seeger Expectation Propagation for Exponential Families , 2005 .

[14]  Daniel Hernández-Lobato,et al.  Expectation propagation in linear regression models with spike-and-slab priors , 2015, Machine Learning.

[15]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[16]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[17]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[18]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[19]  Jacques Wainer,et al.  Flexible Modeling of Latent Task Structures in Multitask Learning , 2012, ICML.

[20]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[21]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[22]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[23]  J. Berger A Robust Generalized Bayes Estimator and Confidence Region for a Multivariate Normal Mean , 1980 .

[24]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[25]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[26]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[27]  Jinbo Bi,et al.  Probabilistic Joint Feature Selection for Multi-task Learning , 2007, SDM.

[28]  David B. Dunson,et al.  Generalized Beta Mixtures of Gaussians , 2011, NIPS.

[29]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[30]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[31]  I. Johnstone,et al.  Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.