Inductive transfer with context-sensitive neural networks

Context-sensitive Multiple Task Learning, or csMTL, is presented as a method of inductive transfer which uses a single output neural network and additional contextual inputs for learning multiple tasks. Motivated by problems with the application of MTL networks to machine lifelong learning systems, csMTL encoding of multiple task examples was developed and found to improve predictive performance. As evidence, the csMTL method is tested on seven task domains and shown to produce hypotheses for primary tasks that are often better than standard MTL hypotheses when learning in the presence of related and unrelated tasks. We argue that the reason for this performance improvement is a reduction in the number of effective free parameters in the csMTL network brought about by the shared output node and weight update constraints due to the context inputs. An examination of IDT and SVM models developed from csMTL encoded data provides initial evidence that this improvement is not shared across all machine learning models.

[1]  Yaser S. Abu-Mostafa,et al.  Hints , 2018, Neural Computation.

[2]  Peter D. Turney The Identification of Context-Sensitive Features: A Formal Definition of Context for Concept Learning , 2002, ArXiv.

[3]  D. Silver,et al.  Continued Practice and Consolidation of a Learning Task , 2005 .

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Peter E. Rossi,et al.  Marketing models of consumer heterogeneity , 1998 .

[6]  Daniel L. Silver,et al.  Requirements for Machine Lifelong Learning , 2007, IWINAC.

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Peter D. Turney The Management of Context-Sensitive Features: A Review of Strategies , 2002, ArXiv.

[10]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[11]  Greg M. Allenby,et al.  A Hierarchical Bayes Model of Primary and Secondary Demand , 1998 .

[12]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[13]  LearningStan Matwin,et al.  The Role of Context in Concept , 1996 .

[14]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[15]  A. Zellner An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias , 1962 .

[16]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[17]  H.-M. Gross,et al.  A neural field approach to topological reinforcement learning in continuous action spaces , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[18]  Daniel L. Silver,et al.  Requirements for Machine Lifelong Learning TR-2005-009 - November, 2005 , 2005 .

[19]  Daniel L. Silver,et al.  Selective Transfer of Task Knowledge Using Stochastic Noise , 2003, Canadian Conference on AI.

[20]  Tom Heskes,et al.  Empirical Bayes for Learning to Learn , 2000, ICML.

[21]  Robert E. Mercer,et al.  The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data , 2002, Canadian Conference on AI.

[22]  Jonathan Baxter,et al.  Learning Model Bias , 1995, NIPS.

[23]  T. Ben-David,et al.  Exploiting Task Relatedness for Multiple , 2003 .

[24]  Paul E. Utgoff,et al.  Machine Learning of Inductive Bias , 1986 .

[25]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[26]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[27]  Daniel L. Silver,et al.  Sequential Consolidation of Learned Task Knowledge , 2004, Canadian Conference on AI.

[28]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[29]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[30]  Daniel L. Silver,et al.  The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness , 1996, Connect. Sci..

[31]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[32]  Jonathan Baxter,et al.  Theoretical Models of Learning to Learn , 1998, Learning to Learn.

[33]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.