Memory Lens: How Much Memory Does an Agent Use?

We propose a new method to study the internal memory used by reinforcement learning policies. We estimate the amount of relevant past information by estimating mutual information between behavior histories and the current action of an agent. We perform this estimation in the passive setting, that is, we do not intervene but merely observe the natural behavior of the agent. Moreover, we provide a theoretical justification for our approach by showing that it yields an implementation-independent lower bound on the minimal memory capacity of any agent that implement the observed policy. We demonstrate our approach by estimating the use of memory of DQN policies on concatenated Atari frames, demonstrating sharply different use of memory across 49 games. The study of memory as information that flows from the past to the current action opens avenues to understand and improve successful reinforcement learning algorithms.

[1]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Sebastian Nowozin,et al.  Improved Information Gain Estimates for Decision Tree Induction , 2012, ICML.

[4]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[5]  Dimitris Kugiumtzis,et al.  Markov chain order estimation with conditional mutual information , 2013 .

[6]  David Barber,et al.  Variational Information Maximization for Neural Coding , 2004, ICONIP.

[7]  John Langford,et al.  PAC Reinforcement Learning with Rich Observations , 2016, NIPS.

[8]  William Bialek,et al.  Entropy and Inference, Revisited , 2001, NIPS.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  P. Grassberger Entropy Estimates from Insufficient Samplings , 2003, physics/0307138.

[11]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[12]  James P. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[13]  Jianfeng Gao,et al.  Recurrent Reinforcement Learning: A Hybrid Approach , 2015, ArXiv.

[14]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[15]  Krishnendu Chatterjee,et al.  Qualitative Analysis of Partially-Observable Markov Decision Processes , 2009, MFCS.