Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Humans and animals can learn complex predictive models that allow them to accurately and reliably reason about real-world phenomena, and they can adapt such models extremely quickly in the face of unexpected changes. Deep neural network models allow us to represent very complex functions, but lack this capacity for rapid online adaptation. The goal in this paper is to develop a method for continual online learning from an incoming stream of data, using deep neural network models. We formulate an online learning procedure that uses stochastic gradient descent to update model parameters, and an expectation maximization algorithm with a Chinese restaurant process prior to develop and maintain a mixture of models to handle non-stationary task distributions. This allows for all models to be adapted as necessary, with new models instantiated for task changes and old models recalled when previously seen tasks are encountered again. Furthermore, we observe that meta-learning can be used to meta-train a model such that this direct online adaptation with SGD is effective, which is otherwise not the case for large function approximators. In this work, we apply our meta-learning for online learning (MOLe) approach to model-based reinforcement learning, where adapting the predictive model is critical for control; we demonstrate that MOLe outperforms alternative prior methods, and enables effective continuous adaptation in non-stationary task distributions such as varying terrains, motor failures, and unexpected disturbances.

[1]  Richard J. Mammone,et al.  Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[2]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[3]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[4]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[5]  Gunes Ercal,et al.  On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[6]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  J. Flanagan,et al.  Modulation of grip force with load force during point-to-point arm movements , 2004, Experimental Brain Research.

[9]  Julien Doyon,et al.  Reorganization and plasticity in the adult brain during learning of motor skills , 2005 .

[10]  J. Doyon,et al.  Reorganization and plasticity in the adult brain during learning of motor skills , 2005, Current Opinion in Neurobiology.

[11]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .

[12]  Anil V. Rao,et al.  ( Preprint ) AAS 09-334 A SURVEY OF NUMERICAL METHODS FOR OPTIMAL CONTROL , 2009 .

[13]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[14]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[15]  Andreas Ruttor,et al.  Bayesian Inference for Change Points in Dynamical Systems with Reusable States - a Chinese Restaurant Process Approach , 2012, AISTATS.

[16]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Dirk P. Kroese,et al.  Chapter 3 – The Cross-Entropy Method for Optimization , 2013 .

[18]  Dirk P. Kroese,et al.  The cross-entropy method for estimation , 2013 .

[19]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[20]  Marek Rei,et al.  Online Representation Learning in Recurrent Neural Language Models , 2015, EMNLP.

[21]  Evangelos Theodorou,et al.  Model Predictive Path Integral Control using Covariance Variable Importance Sampling , 2015, ArXiv.

[22]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[23]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[24]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[25]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[27]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[28]  Oriol Vinyals,et al.  Bayesian Recurrent Neural Networks , 2017, ArXiv.

[29]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[30]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[31]  Steve Renals,et al.  Multiplicative LSTM for sequence modelling , 2016, ICLR.

[32]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Martial Hebert,et al.  Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[35]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[36]  Thomas L. Griffiths,et al.  Online gradient-based mixtures for transfer modulation in meta-learning , 2018, ArXiv.

[37]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[38]  Thomas L. Griffiths,et al.  Modulating transfer between tasks in gradient-based meta-learning , 2018 .

[39]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[40]  Steven C. H. Hoi,et al.  Online Deep Learning: Learning Deep Neural Networks on the Fly , 2017, IJCAI.

[41]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[42]  Zeb Kurth-Nelson,et al.  Been There, Done That: Meta-Learning with Episodic Recall , 2018, ICML.

[43]  Steve Renals,et al.  Dynamic Evaluation of Neural Sequence Models , 2017, ICML.

[44]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.