Block Contextual MDPs for Continual Learning

In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the continual reinforcement learning scenario, the sequence of tasks is another source of nonstationarity. In this work, we propose to examine this continual reinforcement learning setting through the block contextual MDP (BC-MDP) framework, which enables us to relax the assumption of stationarity. This framework challenges RL algorithms to handle both nonstationarity and rich observation settings and, by additionally leveraging smoothness properties, enables us to study generalization bounds for this setting. Finally, we take inspiration from adaptive control to propose a novel algorithm that addresses the challenges introduced by this more realistic BC-MDP setting, allows for zero-shot adaptation at evaluation time, and achieves strong performance on several nonstationary environments.

[1]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[3]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[4]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[5]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[6]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[7]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[8]  Jiajun Wu,et al.  DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions , 2019, Robotics: Science and Systems.

[9]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[10]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[11]  Ludovic Denoyer,et al.  Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization , 2020, ArXiv.

[12]  Nan Jiang,et al.  Markov Decision Processes with Continuous Side Information , 2017, ALT.

[13]  Sinno Jialin Pan,et al.  Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.

[14]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[15]  Jan Swevers,et al.  Optimal robot excitation and identification , 1997, IEEE Trans. Robotics Autom..

[16]  Jan Peters,et al.  Learning complex motions by sequencing simpler motion templates , 2009, ICML '09.

[17]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[18]  Kavosh Asadi,et al.  Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.

[19]  Michel Gevers,et al.  System identification without Lennart Ljung : what would have been different ? , 2006 .

[20]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[21]  Nan Jiang,et al.  Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.

[22]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[23]  Yoshua Bengio,et al.  Toward Training Recurrent Neural Networks for Lifelong Learning , 2018, Neural Computation.

[24]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[25]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.

[26]  Joelle Pineau,et al.  Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.

[27]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[28]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[29]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[30]  Shimon Whiteson,et al.  Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[31]  Kostas E. Bekris,et al.  Fast Model Identification via Physics Engines for Data-Efficient Policy Search , 2017, IJCAI.

[32]  Alessandro Chiuso,et al.  System Identification: A Machine Learning Perspective , 2019, Annu. Rev. Control. Robotics Auton. Syst..

[33]  Dongqi Han,et al.  Variational Recurrent Models for Solving Partially Observable Control Tasks , 2019, ICLR.

[34]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[35]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[36]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[37]  Doina Precup,et al.  Bisimulation Metrics for Continuous Markov Decision Processes , 2011, SIAM J. Comput..

[38]  Steven M. Seitz,et al.  Computing the Physical Parameters of Rigid-Body Motion from Video , 2002, ECCV.

[39]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[40]  Ambuj Tewari,et al.  No-regret Exploration in Contextual Reinforcement Learning , 2020, UAI.

[41]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[42]  G. Marinoschi An identification problem , 2005 .

[43]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Tao Chen,et al.  Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[45]  Mykel J. Kochenderfer,et al.  The Marabou Framework for Verification and Analysis of Deep Neural Networks , 2019, CAV.

[46]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[47]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[48]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[49]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[50]  Jiajun Wu,et al.  Combining Physical Simulators and Object-Based Networks for Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[51]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[52]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[53]  Nan Jiang,et al.  Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.

[54]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[55]  Doina Precup,et al.  Towards Continual Reinforcement Learning: A Review and Perspectives , 2020, ArXiv.

[56]  Lennart Ljung,et al.  Perspectives on system identification , 2010, Annu. Rev. Control..

[57]  Elliot Meyerson,et al.  Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains , 2019, NeurIPS.

[58]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[59]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[61]  Marc'Aurelio Ranzato,et al.  On Tiny Episodic Memories in Continual Learning , 2019 .

[62]  Felipe Petroski Such,et al.  Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials , 2020, AAAI.

[63]  Michael L. Littman,et al.  Policy and Value Transfer in Lifelong Reinforcement Learning , 2018, ICML.

[64]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[65]  Ali Ghodsi,et al.  Robust Locally-Linear Controllable Embedding , 2017, AISTATS.

[66]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[67]  Jiayu Zhou,et al.  Transfer Learning in Deep Reinforcement Learning: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Hod Lipson,et al.  Principled Weight Initialization for Hypernetworks , 2020, ICLR.

[69]  Yee Whye Teh,et al.  Meta reinforcement learning as task inference , 2019, ArXiv.

[70]  Rowan McAllister,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[71]  Shie Mannor,et al.  Contextual Markov Decision Processes , 2015, ArXiv.

[72]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[73]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[74]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[75]  Karl Johan Åström,et al.  Numerical Identification of Linear Dynamic Systems from Normal Operating Records , 1965 .

[76]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[77]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[78]  Martha White,et al.  Meta-Learning Representations for Continual Learning , 2019, NeurIPS.

[79]  Abhinav Gupta,et al.  Environment Probing Interaction Policies , 2019, ICLR.

[80]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[81]  Joelle Pineau,et al.  Learning Robust State Abstractions for Hidden-Parameter Block MDPs , 2021, ICLR.

[82]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[83]  Joelle Pineau,et al.  Multi-Task Reinforcement Learning with Context-based Representations , 2021, ICML.

[84]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[85]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[86]  Jacek Tabor,et al.  Hypernetwork Functional Image Representation , 2019, ICANN.

[87]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[88]  Jinwoo Shin,et al.  Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning , 2020, ICML.

[89]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.