Selective Transfer of Task Knowledge Using Stochastic Noise

The selective transfer of task knowledge within the context of artificial neural networks is studied using a modified version of ηMTL (multiple task learning) previously reported. sMTL is a knowledge based inductive learning system that uses prior task knowledge and stochastic noise to adjust its inductive bias when learning a new task. The MTL representation of previously learned and consolidated tasks is used as the starting point for learning a new primary task. Task rehearsal ensures the stability of related secondary task knowledge within the sMTL network and stochastic noise is used to create plasticity in the network so as to allow the new task to be learned. sMTL controls the level of noise to each secondary task based on a measure of secondary to primary task relatedness. Experiments demonstrate that from impoverished training sets, sMTL uses the prior representations to quickly develop predictive models that have (1) superior generalization ability compared with models produced by single task learning or standard MTL and (2) equivalent generalization ability compared with models produced by ηMTL.

[1]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[2]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[3]  Robert E. Mercer,et al.  Selective transfer of neural network task knowledge , 2000 .

[4]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[5]  Paul E. Utgoff,et al.  Machine Learning of Inductive Bias , 1986 .

[6]  Hilbert J. Kappen,et al.  On-line learning processes in artificial neural networks , 1993 .

[7]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[8]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[9]  Mark B. Ring Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[10]  Richard J. Mammone,et al.  Artificial neural networks for speech and vision , 1994 .

[11]  Robert E. Mercer,et al.  The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness , 1998, Learning to Learn.

[12]  Chuan Wang,et al.  Training neural networks with additive noise in the desired signal , 1999, IEEE Trans. Neural Networks.

[13]  J. Hertz,et al.  Generalization in a linear perceptron in the presence of noise , 1992 .

[14]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[15]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[16]  Daniel L. Silver,et al.  The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness , 1996, Connect. Sci..

[17]  S. C. Suddarth,et al.  Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.

[18]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[19]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[20]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[21]  Robert E. Mercer,et al.  The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data , 2002, Canadian Conference on AI.

[22]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[23]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[24]  Stephen José Hanson,et al.  A stochastic version of the delta rule , 1990 .

[25]  Sebastian Thrun,et al.  Explanation-Based Neural Network Learning for Robot Control , 1992, NIPS.