Learning Macro-Actions in Reinforcement Learning

We present a method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process. The overall idea is to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded. We test the method on a bicycle task, the car-on-the-hill task, the race-track task and some grid-world tasks. For the bicycle and race-track tasks the use of macro-actions approximately halves the learning time, while for one of the grid-world tasks the learning time is reduced by a factor of 5. The method did not work for the car-on-the-hill task for reasons we discuss in the conclusion.

[1]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[2]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[3]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[4]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[5]  R. Sutton,et al.  Macro-Actions in Reinforcement Learning: An Empirical Analysis , 1998 .

[6]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[9]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[10]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[11]  Wolfram Burgard,et al.  The Interactive Museum Tour-Guide Robot , 1998, AAAI/IAAI.

[12]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[13]  John H. Andreae,et al.  A learning machine with monologue , 1969 .

[14]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[15]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[16]  Shlomo Zilberstein,et al.  Reinforcement Learning for Mixed Open-loop and Closed-loop Control , 1996, NIPS.

[18]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[19]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions , 1998, NIPS.

[20]  Glenn A. Iba,et al.  A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.

[21]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[22]  Doina Precup,et al.  Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[23]  R. Korf Learning to solve problems by searching for macro-operators , 1983 .