Laplacian using Abstract State Transition Graphs: A Framework for Skill Acquisition

Automatic definition of macro-actions for Reinforcement Learning (RL) is a way of breaking a large problem into smaller sub-problems. Macro-actions are known to boost the agent's learning process, leading to a better performance. One recent approach, called Laplacian Framework, uses the Proto-Value Functions of the State Transition Graph (STG) associated with a RL problem in order to create options. For larger problems, however, the STG is unavailable. In this context, we propose an improvement upon the Laplacian Framework for large problems, called Laplacian using Abstract State Transition Graphs (LAST-G), which uses an Abstract State Transition Graph (ASTG), a reduced version of the original STG. This approach allows the creation of intra-policies for the discovered options by using the ASTG as a model of the environment. Our experimental results show that the proposed framework is capable of: (i) effectively creating purposeful options; and (ii) successfully executing the identified options.

[1]  Mohammad Rahmati,et al.  Automatic abstraction in reinforcement learning using data mining techniques , 2009, Robotics Auton. Syst..

[2]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[3]  Balaraman Ravindran,et al.  Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks , 2016, ArXiv.

[4]  Marlos C. Machado,et al.  Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[5]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[6]  Shie Mannor,et al.  Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.

[7]  Parham Moradi,et al.  Automatic skill acquisition in Reinforcement Learning using connection graph stability centrality , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[8]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[9]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[10]  Doina Precup,et al.  Using label propagation for learning temporally abstract actions in reinforcement learning , 2013 .

[11]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[12]  Hamid Beigy,et al.  A new method for discovering subgoals and constructing options in reinforcement learning , 2011, IICAI.

[13]  Parham Moradi,et al.  Automatic Skill Acquisition in Reinforcement Learning Agents Using Connection Bridge Centrality , 2010, FGIT-FGCN.

[14]  André da Motta Salles Barreto,et al.  Abstract State Transition Graphs for Model-Based Reinforcement Learning , 2018, 2018 7th Brazilian Conference on Intelligent Systems (BRACIS).

[15]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[16]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17]  Balaraman Ravindran,et al.  Abstraction in Reinforcement Learning in Terms of Metastability , 2012 .

[18]  Von-Wun Soo,et al.  Subgoal Identifications in Reinforcement Learning: A Survey , 2011 .

[19]  Alex Graves,et al.  Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[20]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[21]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[22]  Samuel Gershman,et al.  Design Principles of the Hippocampal Cognitive Map , 2014, NIPS.

[23]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[24]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[25]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[26]  Hamid Beigy,et al.  Using Strongly Connected Components as a Basis for Autonomous Skill Acquisition in Reinforcement Learning , 2009, ISNN.

[27]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[28]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[29]  André da Motta Salles Barreto,et al.  Graph-Based Skill Acquisition For Reinforcement Learning , 2019, ACM Comput. Surv..