State Abstraction as Compression in Apprenticeship Learning

State abstraction can give rise to models of environments that are both compressed and useful, thereby enabling efficient sequential decision making. In this work, we offer the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning. We build on Rate-Distortion theory, the classic Blahut-Arimoto algorithm, and the Information Bottleneck method to develop an algorithm for computing state abstractions that approximate the optimal tradeoff between compression and performance. We illustrate the power of this algorithmic structure to offer insights into effective abstraction, compression, and reinforcement learning through a mixture of analysis, visuals, and experimentation.

[1]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[2]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[3]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[4]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[5]  Ward Whitt,et al.  Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[6]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[7]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[10]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[11]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[12]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[13]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[14]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[16]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[17]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[18]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[19]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[20]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[21]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[22]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[25]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[26]  Prasad Tadepalli,et al.  Automatic Induction of MAXQ Hierarchies , 2007 .

[27]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[28]  Scott Kuindersma,et al.  Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[29]  Andrea Lockerd Thomaz,et al.  Automatic State Abstraction from Demonstration , 2011, IJCAI.

[30]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[31]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[32]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[33]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[34]  Naftali Tishby,et al.  Trading Value and Information in MDPs , 2012 .

[35]  Thomas G. Dietterich,et al.  State Aggregation in Monte Carlo Tree Search , 2014, AAAI.

[36]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[37]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[38]  Nan Jiang,et al.  Improving UCT planning via approximate homomorphisms , 2014, AAMAS.

[39]  Marc G. Bellemare,et al.  Compress and Control , 2015, AAAI.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Anant Sahai,et al.  Control capacity , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[42]  John Lygeros,et al.  Efficient Approximation of Channel Capacities , 2015, IEEE Transactions on Information Theory.

[43]  Alec Solway,et al.  Reinforcement learning, efficient coding, and the statistics of natural tasks , 2015, Current Opinion in Behavioral Sciences.

[44]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[45]  Yuval Peres,et al.  Rate-limited control of systems with uncertain gain , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[46]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[47]  C. Sims Rate–distortion theory and human perception , 2016, Cognition.

[48]  Michael L. Littman,et al.  Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[49]  Victoria Kostina,et al.  Information Performance Tradeoffs in Control , 2016, ArXiv.

[50]  M. Littman,et al.  Toward Good Abstractions for Lifelong Learning , 2017 .

[51]  David J. Schwab,et al.  The Deterministic Information Bottleneck , 2015, Neural Computation.

[52]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[53]  Anru Zhang,et al.  State Compression of Markov Processes via Empirical Low-Rank Estimation , 2018, ArXiv.

[54]  Chris R. Sims,et al.  Policy Generalization In Capacity-Limited Reinforcement Learning , 2018 .

[55]  Michael L. Littman,et al.  State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[56]  Marc G. Bellemare,et al.  Approximate Exploration through State Abstraction , 2018, ArXiv.

[57]  Chris R Sims,et al.  Efficient coding explains the universal law of generalization in human perception , 2018, Science.

[58]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[59]  Sergey Levine,et al.  InfoBot: Structured Exploration in ReinforcementLearning Using Information Bottleneck , 2019 .

[60]  Joelle Pineau,et al.  The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach , 2018, J. Artif. Intell. Res..