Flexible latent variable models for multi-task learning

Given multiple prediction problems such as regression or classification, we are interested in a joint inference framework that can effectively share information between tasks to improve the prediction accuracy, especially when the number of training examples per problem is small. In this paper we propose a probabilistic framework which can support a set of latent variable models for different multi-task learning scenarios. We show that the framework is a generalization of standard learning methods for single prediction problems and it can effectively model the shared structure among different prediction tasks. Furthermore, we present efficient algorithms for the empirical Bayes method as well as point estimation. Our experiments on both simulated datasets and real world classification datasets show the effectiveness of the proposed models in two evaluation settings: a standard multi-task learning setting and a transfer learning setting.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[3]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[4]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[5]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[6]  M. Tanner Tools for statistical inference: methods for the exploration of posterior distributions and likeliho , 1994 .

[7]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[10]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[11]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[12]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[13]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[14]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[15]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[16]  D. Silver,et al.  Selective Functional Transfer : Inductive Bias from Related Tasks , 2001 .

[17]  Michael I. Jordan,et al.  A variational approach to Bayesian logistic regression problems and their extensions , 1996 .

[18]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[19]  P. Lenk,et al.  Hierarchical Bayes Conjoint Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs , 1996 .

[20]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[21]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[22]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[23]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[24]  Tom Heskes,et al.  Empirical Bayes for Learning to Learn , 2000, ICML.

[25]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.