论文信息 - Automatic construction and evaluation of macro-actions in reinforcement learning

Automatic construction and evaluation of macro-actions in reinforcement learning

Abstract In this paper, we propose a new subgoal-based method for automatic construction of useful macro-actions modeled with option framework. We propose a new community detection algorithm to provide an appropriate partitioning of the agent’ transition graph. Subgoals are considered as the border states of the transition graph communities and options are constructed for taking the agent from one community to other communities. Despite the importance of considering the effect of each macro-action on learning speed, there is no generic known mechanism for evaluating macro-actions in the literature. We show that using all of the detected macro-actions are not useful and even in a simple environment, the augmentation of the action space with useless or wrong macro-actions may easily worsen learning performance. We propose four different heuristics for evaluating options. We identify, in this way, inappropriate options and discard them from the agent choices. Experimental results show significant improvements in the speed of learning after pruning options.

Nasser Mozayani | Marzieh Davoodabadi Farahani | N. Mozayani

[1] Jan Hendrik Metzen,et al. Learning Graph-Based Representations for Continuous Reinforcement Learning Domains , 2013, ECML/PKDD.

[2] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[3] Santo Fortunato,et al. Community detection in graphs , 2009, ArXiv.

[4] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.

[5] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6] Kathryn E. Merrick,et al. Modelling motivation for experience-based attention focus in reinforcement learning , 2007 .

[7] M E J Newman,et al. Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8] Jan Hendrik Metzen,et al. Learning the Structure of Continuous Markov Decision Processes , 2014 .

[9] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[10] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[11] Thomas G. Dietterich. An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[12] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[13] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[14] Réka Albert,et al. Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15] Scott Kuindersma,et al. Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[16] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.

[17] Frank Kirchner,et al. Incremental learning of skill collections based on intrinsic motivation , 2013, Front. Neurorobot..

[18] Andrew G. Barto,et al. Behavioral building blocks for autonomous agents: description, identification, and learning , 2008 .

[19] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[20] Reda Alhajj,et al. Improving reinforcement learning by using sequence trees , 2010, Machine Learning.

[21] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[22] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[23] M E J Newman,et al. Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24] Wilco Moerman,et al. Hierarchical Reinforcement Learning: Assignment of Behaviours to Subpolicies by Self-Organization , 2009 .

[25] Junichi Murata,et al. Controlled Use of Subgoals in Reinforcement Learning , 2008 .

[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.