Two Algorithms for Transfer Learning

Transfer learning aims at improving the performance on a target task given some degree of learning on one or more source tasks. This chapter introduces two transfer learning algorithms that can be employed when the source and target domains share the same feature space and class labels. The first algorithm is a hierarchical Bayesian extension of naive Bayes; the second is a version of logistic regression in which the prior distribution over the weight values is learned from an ensemble of source tasks. The methods are tested on a real-world task of predicting whether a person will accept or decline a meeting invitation. The results demonstrate consistent successful transfer of learning when there is an ensemble of source tasks.

[1]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[2]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[3]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[4]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[5]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[6]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[7]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[8]  Adam Berger,et al.  The Improved Iterative Scaling Algorithm A Gentle Introduction , 2003 .

[9]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[10]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[11]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[12]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[13]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[14]  Andrew McCallum,et al.  Composition of Conditional Random Fields for Transfer Learning , 2005, HLT.

[15]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.