Bayesian Max-margin Multi-Task Learning with Data Augmentation

Both max-margin and Bayesian methods have been extensively studied in multi-task learning, but have rarely been considered together. We present Bayesian max-margin multi-task learning, which conjoins the two schools of methods, thus allowing the discriminative max-margin methods to enjoy the great flexibility of Bayesian methods on incorporating rich prior information as well as performing nonparametric Bayesian feature learning with the latent dimensionality resolved from data. We develop Gibbs sampling algorithms by exploring data augmentation to deal with the non-smooth hinge loss. For nonparametric models, our algorithms do not need to make mean-field assumptions or truncated approximation. Empirical results demonstrate superior performance than competitors in both multitask classification and regression.

[1]  Ming Yang,et al.  Multi-Task Learning with Gaussian Matrix Generalized Inverse Gaussian Model , 2013, ICML.

[2]  Bo Zhang,et al.  Scalable inference in max-margin topic models , 2013, KDD.

[3]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[4]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[5]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[6]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[7]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[8]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[9]  Narendra Ahuja,et al.  Robust visual tracking via multi-task sparse learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[11]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[12]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[13]  W. R. Schucany,et al.  Generating Random Variates Using Transformations with Multiple Roots , 1976 .

[14]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[15]  Oluwasanmi Koyejo,et al.  Constrained Bayesian Inference for Low Rank Multitask Learning , 2013, UAI.

[16]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[17]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[18]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[19]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[20]  Onno Zoeter,et al.  Sparse Bayesian Multi-Task Learning , 2011, NIPS.

[21]  O. Barndorff-Nielsen,et al.  Exponential transformation models , 1982, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[22]  Gunnar Rätsch,et al.  Multitask Learning in Computational Biology , 2012, ICML Unsupervised and Transfer Learning.

[23]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[24]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[25]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[26]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[27]  Ning Chen,et al.  Infinite Latent SVM for Classification and Multi-task Learning , 2011, NIPS.

[28]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[29]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[30]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[31]  Hal Daumé,et al.  Infinite Predictor Subspace Models for Multitask Learning , 2010, AISTATS.

[32]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[33]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[34]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[35]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[36]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..