A new transfer learning framework with application to model-agnostic multi-task learning

Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner’s choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.

[1]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[2]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[3]  Zdzislaw Piasta,et al.  Rough Classifiers Sensitive to Costs Varying from Object to Object , 1998, Rough Sets and Current Trends in Computing.

[4]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[5]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[6]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[7]  S. Li Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jimeng Sun,et al.  Patient Risk Prediction Model via Top-k Stability Selection , 2013, SDM.

[10]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[11]  Edwin V. Bonilla,et al.  Kernel Multi-task Learning using Task-specific Features , 2007, AISTATS.

[12]  P. Lenk,et al.  Hierarchical Bayes Conjoint Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs , 1996 .

[13]  Sebastian Thrun,et al.  Learning To Learn: Introduction , 1996 .

[14]  Ulf Brefeld,et al.  Learning Linear Classifiers Sensitive to Example Dependent and Noisy Costs , 2003, IDA.

[15]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[16]  Svetha Venkatesh,et al.  Multiple task transfer learning with small sample sizes , 2015, Knowledge and Information Systems.

[17]  J. Pearl Some Thoughts Concerning Transfer Learning, with Applications to Meta-Analysis and Data-Sharing Estimation , 2012 .

[18]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[19]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[20]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Qiang Yang,et al.  Spectral domain-transfer learning , 2008, KDD.

[23]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[24]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[25]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[26]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[27]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[28]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[29]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[30]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[31]  Svetha Venkatesh,et al.  What shall I share and with Whom? - A Multi-Task Learning Formulation using Multi-Faceted Task Relationships , 2015, SDM.

[32]  Svetha Venkatesh,et al.  A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning , 2012, UAI.

[33]  Ning Chen,et al.  Infinite Latent SVM for Classification and Multi-task Learning , 2011, NIPS.

[34]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[35]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[36]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[37]  Pratik Jawanpuria,et al.  A Convex Feature Learning Formulation for Latent Task Structure Discovery , 2012, ICML.

[38]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[39]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[40]  Changshui Zhang,et al.  Transferred Dimensionality Reduction , 2008, ECML/PKDD.

[41]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[42]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[43]  John Blitzer,et al.  Co-Training for Domain Adaptation , 2011, NIPS.

[44]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[45]  Svetha Venkatesh,et al.  Regularized nonnegative shared subspace learning , 2011, Data Mining and Knowledge Discovery.

[46]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[47]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[48]  Svetha Venkatesh,et al.  Factorial Multi-Task Learning : A Bayesian Nonparametric Approach , 2013, ICML.

[49]  Stephen P. Boyd,et al.  A Mathematical Model for Interpretable Clinical Decision Support with Applications in Gynecology , 2012, PloS one.

[50]  Svetha Venkatesh,et al.  Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning , 2015, PAKDD.

[51]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[52]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[53]  Byron Dom,et al.  Document preprocessing for naive Bayes classification and clustering with mixture of multinomials , 2004, KDD.

[54]  Jacques Wainer,et al.  Flexible Modeling of Latent Task Structures in Multitask Learning , 2012, ICML.

[55]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[56]  Pedro M. Domingos,et al.  Deep transfer via second-order Markov logic , 2009, ICML '09.

[57]  Chris Clifton,et al.  Privacy-preserving data integration and sharing , 2004, DMKD '04.

[58]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[59]  Qing Wang,et al.  MTForest: Ensemble Decision Trees based on Multi-Task Learning , 2008, ECAI.

[60]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[61]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[62]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[63]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[64]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.