Learning Causal State Representations of Partially Observable Environments

Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observation processes that cannot be solved by traditional memory-limited methods. Our experiments demonstrate systematic improvement of the DQN and tabular models using approximate causal state representations with respect to recurrent-DQN baselines trained with raw inputs.

[1]  Xue Liu,et al.  An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks , 2017, Neural Computation.

[2]  James P. Crutchfield,et al.  Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation , 2010, Chaos.

[3]  James P. Crutchfield,et al.  Computational Mechanics of Input–Output Processes: Structured Transformations and the ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \ , 2014, Journal of Statistical Physics.

[4]  James P. Crutchfield,et al.  Bayesian Structural Inference for Hidden Processes , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[6]  Shimon Whiteson,et al.  Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[7]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[8]  Geoffrey J. Gordon,et al.  Recurrent Predictive State Policy Networks , 2018, ICML.

[9]  Byron Boots,et al.  Hilbert Space Embeddings of Predictive State Representations , 2013, UAI.

[10]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[11]  Georg M. Goerg,et al.  Mixed LICORS: A Nonparametric Algorithm for Predictive State Reconstruction , 2012, AISTATS.

[12]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[13]  Xue Liu,et al.  A Comparison of Rule Extraction for Different Recurrent Neural Network Models and Grammatical Complexity , 2018, ArXiv.

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[16]  Joelle Pineau,et al.  Representing Systems with Hidden State , 2006, AAAI.

[17]  James P. Crutchfield,et al.  The Origins of Computational Mechanics: A Brief Intellectual History and Several Clarifications , 2017, ArXiv.

[18]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[19]  Shreyash Tambe,et al.  EFFECTIVE DATA MINING USING NEURAL NETWORKS , 2016 .

[20]  Katja Hofmann,et al.  Variational Inference for Data-Efficient Model Learning in POMDPs , 2018, ArXiv.

[21]  Geoffrey J. Gordon,et al.  Supervised Learning for Dynamical System Learning , 2015, NIPS.

[22]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[23]  Pieter Abbeel,et al.  Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[24]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[25]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[26]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[27]  Alan Fern,et al.  Learning Finite State Representations of Recurrent Policy Networks , 2018, ICLR.

[28]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[29]  James P. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[30]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[31]  Wulfram Gerstner,et al.  Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation , 2018, ICML.

[32]  Young,et al.  Inferring statistical complexity. , 1989, Physical review letters.

[33]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[34]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[35]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[36]  C. Lee Giles,et al.  An Empirical Evaluation of Recurrent Neural Network Rule Extraction , 2017, ArXiv.

[37]  Jude W. Shavlik,et al.  Extracting Refined Rules from Knowledge-Based Neural Networks , 1993, Machine Learning.

[38]  Tameru Hailesilassie,et al.  Rule Extraction Algorithm for Deep Neural Networks: A Review , 2016, ArXiv.

[39]  Susanne Still,et al.  Information-theoretic approach to interactive learning , 2007, 0709.1948.