Using Strongly Connected Components as a Basis for Autonomous Skill Acquisition in Reinforcement Learning

Hierarchical reinforcement learning (HRL) has had a vast range of applications in recent years. Preparing mechanisms for autonomous acquisition of skills has been a main topic of research in this area. While different methods have been proposed to achieve this goal, few methods have been shown to be successful both in performance and also efficiency in terms of time complexity of the algorithm. In this paper, a linear time algorithm is proposed to find subgoal states of the environment in early episodes of learning. Having subgoals available in early phases of a learning task, results in building skills that dramatically increase the convergence rate of the learning process.

[1]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[2]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[5]  Jean-Daniel Zucker,et al.  Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings , 2005, SARA.

[6]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[7]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[8]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[9]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[10]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[11]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[12]  Tapio Elomaa,et al.  Machine Learning: ECML 2002 , 2002, Lecture Notes in Computer Science.

[13]  Yang Gao,et al.  Connect-Based Subgoal Discovery for Options in Hierarchical Reinforcement Learning , 2007, Third International Conference on Natural Computation (ICNC 2007).

[14]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[15]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[16]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions , 1998, NIPS.

[18]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .