Stochastic real-valued reinforcement learning to solve a nonlinear control problem

This paper presents a new approach to reinforcement learning (RL) to solve a nonlinear control problem efficiently in which state and action spaces are continuous. We provide a hierarchical RL algorithm composed of local linear controllers and TD-learning, which are both very simple. The continuous state space is discretized into an array of coarse boxes, and each box has its own local linear controller for choosing primitive continuous actions. The higher-level of the hierarchy accumulates state-values using tables with one entry for each box. Each linear controller improves the local control policy by using an actor-critic method. The algorithm was applied to a simulation of a cart-pole swing-up problem, and feasible solutions are found in less time than those of conventional discrete RL methods.

[1]  Andrew W. Moore,et al.  Applying Online Search Techniques to Continuous-State Reinforcement Learning , 1998, AAAI/IAAI.

[2]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[3]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[4]  Cheng-Jian Lin,et al.  An ART-based fuzzy adaptive learning control network , 1994, NAFIPS/IFIS/NASA '94. Proceedings of the First International Joint Conference of The North American Fuzzy Information Processing Society Biannual Conference. The Industrial Fuzzy Control and Intellige.

[5]  Scott Davies,et al.  Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[6]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[7]  Stephan Pareigis,et al.  Adaptive Choice of Grid and Time in Reinforcement Learning , 1997, NIPS.

[8]  Kenji Doya,et al.  Efficient Nonlinear Control with Actor-Tutor Architecture , 1996, NIPS.

[9]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[11]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[12]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[14]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[15]  Peter D. Lawrence,et al.  Transition Point Dynamic Programming , 1993, NIPS.