Meta-Learning Related Tasks with Recurrent Networks: Optimization and Generalization

There have been recent interest in meta-learning systems: i.e., networks that are trained to learn across multiple tasks. This paper focuses on optimization and generalization of a meta-learning system based on recurrent networks. The optimization investigates the influence of diverse structures and parameters on its performance. We demonstrate the generalization (robustness) of our meta-learning system to learn across multiple tasks including tasks unseen during the metatraining phase. We introduce a meta-cost function (Mean Squared Fair Error) that enhances the performance of the system by not penalizing it during transitions to learning a new task. Evaluation results are presented for Boolean and quadratic functions datasets. The best performance is obtained using a Long Short-Term Memory (LSTM) topology without a forget gate and with a clipped memory cell. The results demonstrate i) the impact of different LSTM architectures, parameters, and error functions on the meta-learning process; ii) that the mean squared fair error function does improve performance for best learning; and iii) the robustness of our meta-learning framework as it generalizes well when tested on tasks unseen during meta-training. Comparison between No-Forget-Gate LSTM and Gated Recurrent Unit also suggest that absence of a memory cell tends to degrade performance.

[1]  Samy Bengio,et al.  On the search for new learning rules for ANNs , 1995, Neural Processing Letters.

[2]  Chelsea Finn,et al.  Active One-shot Learning , 2017, ArXiv.

[3]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[4]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[5]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[6]  N. E. Cotter,et al.  Learning algorithms and fixed dynamics , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[7]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[8]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[9]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[10]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Jitendra Malik,et al.  Learning to Optimize , 2016, ICLR.

[12]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[13]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[14]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[15]  Li Zhang,et al.  Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.

[16]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[17]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[20]  Peter R. Conwell,et al.  Fixed-weight networks can learn , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[21]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[22]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[23]  A. Steven Younger,et al.  Fixed-weight on-line learning , 1999, IEEE Trans. Neural Networks.

[24]  Sepp Hochreiter,et al.  Meta-learning with backpropagation , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[25]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[26]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[27]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.