论文信息 - A Concept Filtering Approach for Diverse Density to Discover Subgoals in Reinforcement Learning

A Concept Filtering Approach for Diverse Density to Discover Subgoals in Reinforcement Learning

In the reinforcement learning context, subgoal discovery methods aim to find bottlenecks in problem state space so that the problem can naturally be decomposed into smaller sub-problems. In this paper, we propose a concept filtering method that extends an existing subgoal discovery method, namely diverse density, to be used for both fully and partially observable RL problems. The proposed method is successful in discovering useful subgoals with the help of multiple instance learning. Compared to the original algorithm, the resulting approach runs significantly faster without sacrificing the solution quality. Moreover, it can effectively be employed to find observational bottlenecks of problems with perceptually aliased states.

[1] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[2] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[3] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .

[4] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[5] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[6] Tomás Lozano-Pérez,et al. A Framework for Multiple-Instance Learning , 1997, NIPS.

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Aidong Zhang,et al. Bridging Centrality: Identifying Bridging Nodes in Scale-free Networks , 2006 .

[9] Bernhard Hengst,et al. Hierarchical Approaches , 2012, Reinforcement Learning.

[10] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[11] Jiming Liu,et al. Discovering global network communities based on local centralities , 2008, TWEB.

[12] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[13] Takeshi Yoshikawa,et al. An Acquiring Method of Macro-Actions in Reinforcement Learning , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[14] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.

[15] Duncan J. Watts,et al. Collective dynamics of ‘small-world’ networks , 1998, Nature.

[16] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[17] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[18] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[19] Jean-Arcady Meyer,et al. Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments , 1998 .

[20] Andrew G. Barto,et al. Behavioral building blocks for autonomous agents: description, identification, and learning , 2008 .

[21] Christophe Claramunt,et al. Topological Analysis of Urban Street Networks , 2004 .

[22] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[23] Leonard M. Freeman,et al. A set of measures of centrality based upon betweenness , 1977 .