The value of abstraction

ions Guide Exploration and Generalization Suppose during your camping trip you want to learn to fish. At first, you try a few spots along the river at random. But soon, you notice a pattern: Certain areas have more vegetation, others have less. This information provides a new and more efficient way for you to organize your fishing attempts. For example, you could fish areas with high and low vegetation to gain a range of experiences about how it affects your catch. Or, on your first trip you may learn that your best catch was in high vegetation areas, so that on your second trip to a different river, you seek out similar areas. Here, the abstract concept of river vegetation guides your exploration in the current fishing task and tracks a generalizable feature relevant to future tasks, which both allow you to make better use of your limited time and experience. In RL, abstractions similarly facilitate efficient learning by guiding exploration and generalization. But what is the basis of this guidance? Put another way, the concept of vegetation is useful, but what determines the identification of such a concept in general? Below, we discuss two ways in which abstractions can guide learning: domain structure and representational simplicity.

[1]  Marc G. Bellemare,et al.  DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[2]  Parag Singla,et al.  ASAP-UCT: Abstraction of State-Action Pairs in UCT , 2015, IJCAI.

[3]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[4]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[5]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[6]  Tom Griffiths,et al.  Representational efficiency outweighs action efficiency in human program induction , 2018, CogSci.

[7]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[8]  Peter Stone,et al.  The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.

[9]  D A Rosenbaum,et al.  Hierarchical control of rapid movement sequences. , 1983, Journal of experimental psychology. Human perception and performance.

[10]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[11]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[12]  Michael L. Littman,et al.  Transfer with Model Features in Reinforcement Learning , 2018, ArXiv.

[13]  Noah D. Goodman,et al.  Learning a theory of causality. , 2011, Psychological review.

[14]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[15]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[16]  Marlos C. Machado,et al.  Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[17]  Nan Jiang,et al.  Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.

[18]  A. P. Hyper-parameters Count-Based Exploration with Neural Density Models , 2017 .

[19]  Thomas L. Griffiths,et al.  Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic , 2015, Top. Cogn. Sci..

[20]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[21]  Parag Singla,et al.  OGA-UCT: On-the-Go Abstractions in UCT , 2016, ICAPS.

[22]  Alessandro Lazaric,et al.  Regret Minimization in MDPs with Options without Prior Knowledge , 2017, NIPS.

[23]  Thomas G. Dietterich,et al.  State Aggregation in Monte Carlo Tree Search , 2014, AAAI.

[24]  Zoran Popovic,et al.  Efficient Bayesian Clustering for Reinforcement Learning , 2016, IJCAI.

[25]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[26]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[27]  Shie Mannor,et al.  Approximate Value Iteration with Temporally Extended Actions , 2015, J. Artif. Intell. Res..

[28]  Marlos C. Machado,et al.  The Eigenoption-Critic Framework , 2017, ArXiv.

[29]  Stuart J. Russell Rationality and Intelligence , 1995, IJCAI.

[30]  Alessandro Lazaric,et al.  Exploration – Exploitation in MDPs with Options , 2016 .

[31]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[32]  Satinder Singh,et al.  Value Prediction Network , 2017, NIPS.

[33]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[34]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[35]  Leslie Pack Kaelbling,et al.  From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[36]  Earl D. Sacerdott Planning in a hierarchy of abstraction spaces , 1973, IJCAI 1973.

[37]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[38]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[39]  Shie Mannor,et al.  Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.

[40]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[41]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[42]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  L. A. Jeffress Cerebral mechanisms in behavior : the Hixon symposium , 1951 .

[44]  M. Botvinick Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[45]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[46]  A. Gordon,et al.  Choosing between movement sequences: A hierarchical editor model. , 1984 .

[47]  Ronald Ortner,et al.  Noname manuscript No. (will be inserted by the editor) Adaptive Aggregation for Reinforcement Learning in Average Reward Markov Decision Processes , 2022 .

[48]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[49]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[50]  K. Lashley The problem of serial order in behavior , 1951 .

[51]  Tom Bylander,et al.  The Computational Complexity of Propositional STRIPS Planning , 1994, Artif. Intell..

[52]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[53]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[54]  Per B. Sederberg,et al.  The Successor Representation and Temporal Context , 2012, Neural Computation.

[55]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[56]  Leslie Pack Kaelbling,et al.  Constructing Symbolic Representations for High-Level Planning , 2014, AAAI.

[57]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[58]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[59]  Marie desJardins,et al.  Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes , 2015, IJCAI.

[60]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[61]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[62]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[63]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[64]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[65]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[66]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, bioRxiv.

[67]  N. Chater,et al.  Simplicity: a unifying principle in cognitive science? , 2003, Trends in Cognitive Sciences.

[68]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[69]  Marc G. Bellemare,et al.  Approximate Exploration through State Abstraction , 2018, ArXiv.

[70]  Nan Jiang,et al.  Improving UCT planning via approximate homomorphisms , 2014, AAMAS.

[71]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.