Bayesian Group Feature Selection for Support Vector Learning Machines

Group Feature Selection (GFS) has proven to be useful in improving the interpretability and prediction performance of learned model parameters in many machine learning and data mining applications. Existing GFS models were mainly based on square loss and logistic loss for regression and classification, leaving the ϵ-insensitive loss and the hinge loss popularized by Support Vector Learning (SVL) machines still unexplored. In this paper, we present a Bayesian GFS framework for SVL machines based on the pseudo likelihood and data augmentation idea. With Bayesian inference, our method can circumvent the cross-validation for regularization parameters. Specifically, we apply the mean field variational method in an augmented space to derive the posterior distribution of model parameters and hyper-parameters for Bayesian estimation. Both regression and classification experiments conducted on synthetic and real-world data sets demonstrate that our proposed approach outperforms a number of competitors.

[1]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[2]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[3]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[4]  Daniel Hernández-Lobato,et al.  Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation , 2013, J. Mach. Learn. Res..

[5]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[6]  D. L. Hall,et al.  Mathematical Techniques in Multisensor Data Fusion , 1992 .

[7]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[8]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Yung C. Shin,et al.  Sparse Multiple Kernel Learning for Signal Processing Applications , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[12]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[13]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[14]  Volker Roth,et al.  The Bayesian group-Lasso for analyzing contingency tables , 2009, ICML '09.

[15]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[16]  Yung C. Shin,et al.  A variational Bayesian framework for group feature selection , 2012, International Journal of Machine Learning and Cybernetics.

[17]  Jing Wang,et al.  Online Group Feature Selection , 2013, IJCAI.

[18]  Ivor W. Tsang,et al.  Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[19]  Zenglin Xu,et al.  Online Learning for Group Lasso , 2010, ICML.

[20]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[21]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[22]  Shinichi Nakajima,et al.  Bayesian Group-Sparse Modeling and Variational Inference , 2014, IEEE Transactions on Signal Processing.

[23]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[24]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..