Causal Dynamics Learning for Task-Independent State Abstraction

Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states. In this paper, we introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL), which first learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action, thus generalizing well to unseen states. A state abstraction can then be derived from the learned dynamics, which not only improves sample efficiency but also applies to a wider range of tasks than existing state abstraction methods. Evaluated on two simulated environments and downstream tasks, both the dynamics model and policies learned by the proposed method generalize well to unseen states and the derived state abstraction improves sample efficiency compared to learning without it.

[1]  Yuke Zhu,et al.  Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[2]  Danilo Jimenez Rezende,et al.  Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning , 2021, NeurIPS Datasets and Benchmarks.

[3]  Pulkit Agrawal,et al.  Learning Task Informed Abstractions , 2021, ICML.

[4]  Bernhard Schölkopf,et al.  Causal Influence Detection for Improving Efficiency in Reinforcement Learning , 2021, NeurIPS.

[5]  Nan Rosemary Ke,et al.  Neural Production Systems , 2021, NeurIPS.

[6]  Matthew E. Taylor,et al.  Model-Invariant State Abstractions for Model-Based Reinforcement Learning , 2021, ArXiv.

[7]  Jos'e Miguel Hern'andez-Lobato,et al.  Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation , 2020, ArXiv.

[8]  Joelle Pineau,et al.  Intervention Design for Effective Sim2Real Transfer , 2020, ArXiv.

[9]  B. Schölkopf,et al.  Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning , 2020, ICML.

[10]  Yuke Zhu,et al.  robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[11]  Andrew Gordon Wilson,et al.  On the model-based stochastic value gradient for continuous reinforcement learning , 2020, L4DC.

[12]  Dieter Fox,et al.  Causal Discovery in Physical Systems from Videos , 2020, NeurIPS.

[13]  Yoshua Bengio,et al.  Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems , 2020, ArXiv.

[14]  S. Levine,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[15]  Doina Precup,et al.  Invariant Causal Prediction for Block MDPs , 2020, ICML.

[16]  Jeremy Nixon,et al.  Resolving Spurious Correlations in Causal Models of Environments via Interventions , 2020, ArXiv.

[17]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2019, ICLR.

[18]  Silvio Savarese,et al.  Causal Induction from Visual Observations for Goal Directed Tasks , 2019, ArXiv.

[19]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[20]  Joelle Pineau,et al.  Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.

[21]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[22]  Pieter Abbeel,et al.  Sub-policy Adaptation for Hierarchical Reinforcement Learning , 2019, ICLR.

[23]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[24]  Nicolas Heess,et al.  Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.

[25]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[26]  Michael I. Jordan,et al.  Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[27]  Pieter Abbeel,et al.  Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[28]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[30]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Nolan Wagener,et al.  Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[33]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[34]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[35]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[36]  Illtyd Trethowan Causality , 1938 .

[37]  P. Stone,et al.  Task-Independent Causal State Abstraction , 2021 .

[38]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .

[39]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[40]  Yishay Mansour,et al.  Approximate Equivalence of Markov Decision Processes , 2003, COLT.

[41]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[42]  Dominik Janzing,et al.  Necessary and sufficient conditions for causal feature selection in time series with latent common causes , 2022 .