论文信息 - Action-Sufficient State Representation Learning for Control with Structural Constraints - 字舞流文

Action-Sufficient State Representation Learning for Control with Structural Constraints

Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed Action-Sufficient state Representations (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning. Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency.

Jos'e Miguel Hern'andez-Lobato | Clark Glymour | Bernhard Scholkopf | Chaochao Lu | Biwei Huang | Liu Leqi | Kun Zhang

[1] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[2] Sergey Levine,et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[3] S. Srihari. Mixture Density Networks , 1994 .

[4] Bernhard Scholkopf. Causality for Machine Learning , 2019 .

[5] Pablo Samuel Castro,et al. Scalable methods for computing state similarity in deterministic Markov Decision Processes , 2019, AAAI.

[6] Stefano Ermon,et al. Predictive Coding for Locally-Linear Control , 2020, ICML.

[7] Karol Gregor,et al. Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[8] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[9] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.

[10] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.

[11] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[12] Kun Zhang,et al. Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models , 2019, ICML.

[13] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[14] Philippe Beaudoin,et al. Independently Controllable Factors , 2017, ArXiv.

[15] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[16] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .

[17] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[18] Yoshua Bengio,et al. The Consciousness Prior , 2017, ArXiv.

[19] Joshua B. Tenenbaum,et al. Human Learning in Atari , 2017, AAAI Spring Symposia.

[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[21] Mohammad Norouzi,et al. Mastering Atari with Discrete World Models , 2020, ICLR.

[22] Sergey Levine,et al. Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[24] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.

[25] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[26] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[27] Jiji Zhang,et al. Intervention, determinism, and the causal minimality condition , 2011, Synthese.

[28] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[31] Uri Shalit,et al. Deep Kalman Filters , 2015, ArXiv.

[32] Joelle Pineau,et al. Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.

[33] Maximilian Karl,et al. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[34] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[35] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[36] Tom Burr,et al. Causation, Prediction, and Search , 2003, Technometrics.

[37] Rowan McAllister,et al. Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[38] Sergey Levine,et al. Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[39] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[40] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[41] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[42] Kun Zhang,et al. Domain Adaptation As a Problem of Inference on Graphical Models , 2020, NeurIPS.

[43] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[44] Bernhard Schölkopf,et al. Causal Discovery from Heterogeneous/Nonstationary Data , 2019, J. Mach. Learn. Res..

[45] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[46] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[47] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[48] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[49] Aapo Hyvärinen,et al. A General Linear Non-Gaussian State-Space Model , 2011, ACML.