论文信息 - Been There, Done That: Meta-Learning with Episodic Recall

Been There, Done That: Meta-Learning with Episodic Recall

Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur - as they do in natural environments - metalearning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious environments, then develop a meta-learning architecture for solving these environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of agents with this episodic LSTM in five meta-learning environments with reoccurring tasks, ranging from bandits to navigation and stochastic sequential decision problems.

[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2] D Marr,et al. Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[3] D. Blackwell,et al. Ferguson Distributions Via Polya Urn Schemes , 1973 .

[4] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[5] John R. Anderson. The Adaptive Character of Thought , 1990 .

[6] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[7] Sebastian Thrun,et al. Explanation-based neural network learning a lifelong learning approach , 1995 .

[8] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .

[9] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[10] Huberman,et al. Strong regularities in world wide web surfing , 1998, Science.

[11] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14] Joshua B. Tenenbaum,et al. Fragment Grammars: Exploring Computation and Reuse in Language , 2009 .

[15] Yee Whye Teh,et al. Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[16] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[17] Jason Weston,et al. Memory Networks , 2014, ICLR.

[18] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[19] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[20] Jonas Schreiber. Mathematical Statistics A Unified Introduction , 2016 .

[21] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[22] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[23] Joel Z. Leibo,et al. Model-Free Episodic Control , 2016, ArXiv.

[24] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[25] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[26] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[27] Demis Hassabis,et al. Neural Episodic Control , 2017, ICML.

[28] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[29] Aurko Roy,et al. Learning to Remember Rare Events , 2017, ICLR.

[30] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31] Kenneth A. Norman,et al. Refresh my memory: Episodic memory reinstatements intrude on working memory maintenance , 2017, Cognitive, Affective, & Behavioral Neuroscience.