Efficient Non-Linear Control by Combining Q-learning with Local Linear Controllers

This paper presents a new approach to reinforcement learning (RL) to solve a non-linear control problem eÆciently in which state and action spaces are continuous. In real-world applications, an approach combining discrete RL methods with linear controllers is promising since there are many non-linear control problems that can be decomposed into several local linear-control tasks. We provide a hierarchical RL algorithm composed of local linear controllers and Q-learning, which are both very simple. The continuous stateaction space is discretized into an array of coarse boxes, and each box has its own local linear controller as an abstract action. The higher-level of the hierarchy is a conventional discrete RL algorithm that chooses the abstract actions. Each linear controller improves the local control policy by using an actor-critic method. The coarse state-space discretization is a quite simple way to cope with the curse of dimensionality, but often gives rise to non-Markovian e ects. In our approach, the local linear controllers make up for these undesirable e ects. The algorithm was applied to a simulation of a cartpole swing-up problem, and feasible solutions are found in less time than those of conventional discrete RL methods.

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Richard S. Sutton,et al.  Reinforcement learning architectures for animats , 1991 .

[3]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[4]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[6]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[7]  Long Ji Lin,et al.  Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.

[8]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[9]  Peter D. Lawrence,et al.  Transition Point Dynamic Programming , 1993, NIPS.

[10]  Shigenobu Kobayashi,et al.  Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.

[11]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[12]  Chin-Teng Lin,et al.  Reinforcement learning for an ART-based fuzzy adaptive learning control network , 1996, IEEE Trans. Neural Networks.

[13]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[14]  Scott Davies,et al.  Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[15]  Kenji Doya,et al.  Efficient Nonlinear Control with Actor-Tutor Architecture , 1996, NIPS.

[16]  Matthias Heger The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.

[17]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[18]  Stephan Pareigis,et al.  Adaptive Choice of Grid and Time in Reinforcement Learning , 1997, NIPS.

[19]  Andrew W. Moore,et al.  Applying Online Search Techniques to Continuous-State Reinforcement Learning , 1998, AAAI/IAAI.

[20]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[21]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[22]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[23]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[24]  Satinder Singh,et al.  An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.