Been There, Done That: Meta-Learning with Episodic Recall

Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur - as they do in natural environments - metalearning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious environments, then develop a meta-learning architecture for solving these environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of agents with this episodic LSTM in five meta-learning environments with reoccurring tasks, ranging from bandits to navigation and stochastic sequential decision problems.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[3]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[4]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[5]  John R. Anderson The Adaptive Character of Thought , 1990 .

[6]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[7]  Sebastian Thrun,et al.  Explanation-based neural network learning a lifelong learning approach , 1995 .

[8]  Jieyu Zhao,et al.  Simple Principles of Metalearning , 1996 .

[9]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[10]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[11]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Joshua B. Tenenbaum,et al.  Fragment Grammars: Exploring Computation and Reuse in Language , 2009 .

[15]  Yee Whye Teh,et al.  Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[16]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[17]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[18]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[19]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[20]  Jonas Schreiber Mathematical Statistics A Unified Introduction , 2016 .

[21]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[22]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[23]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[24]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[25]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[26]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[27]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[28]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[29]  Aurko Roy,et al.  Learning to Remember Rare Events , 2017, ICLR.

[30]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31]  Kenneth A. Norman,et al.  Refresh my memory: Episodic memory reinstatements intrude on working memory maintenance , 2017, Cognitive, Affective, & Behavioral Neuroscience.