Abstract State Transition Graphs for Model-Based Reinforcement Learning

Skill acquisition methods for Reinforcement Learning (RL) are focused on solving problems by breaking them into smaller sub-problems, allowing the learning agent to reuse tasks for other similar problems. Many of these skill acquisition methods use a State Transition Graph (STG). Nevertheless, the problem is that STGs are only available for simple RL problems, given that, for complex problems, the resulting STG becomes too large to be handled in practice. In this paper, we propose a method for creating Abstract State Transition Graphs (ASTGs) that fuse structurally similar states into a single abstract state. We show that an ASTG is capable of: (i) efficiently identifying similar states; (ii) greatly reducing the number of states of a STG; and (iii) detecting temporal features, thus enabling the differentiation of states based on their predecessors. This allows the ASTG to be (i) more accurate, since it succeeds at creating abstract states by merging similar states with similar previous steps; as well as (ii) manageable with respect to its size.

[1]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Jan Hendrik Metzen,et al.  Learning Graph-Based Representations for Continuous Reinforcement Learning Domains , 2013, ECML/PKDD.

[3]  Parham Moradi,et al.  Automatic skill acquisition in Reinforcement Learning using connection graph stability centrality , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[4]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Hamid Beigy,et al.  A new method for discovering subgoals and constructing options in reinforcement learning , 2011, IICAI.

[6]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[7]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[8]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[9]  Hamid Beigy,et al.  A novel graphical approach to automatic abstraction in reinforcement learning , 2013, Robotics Auton. Syst..

[10]  Shie Mannor,et al.  Adaptive Skills Adaptive Partitions (ASAP) , 2016, NIPS.

[11]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[12]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[13]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[14]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Mohammad Rahmati,et al.  Automatic abstraction in reinforcement learning using data mining techniques , 2009, Robotics Auton. Syst..

[16]  Balaraman Ravindran,et al.  Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks , 2016, ArXiv.

[17]  Doina Precup,et al.  Using label propagation for learning temporally abstract actions in reinforcement learning , 2013 .

[18]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[19]  Parham Moradi,et al.  A Local Graph Clustering Algorithm for Discovering Subgoals in Reinforcement Learning , 2010, FGIT-FGCN.

[20]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.