VQQL. Applying Vector Quantization to Reinforcement Learning

Reinforcement learning has proven to be a set of successful techniques for finding optimal policies on uncertain and/or dynamic domains, such as the RoboCup. One of the problems on using such techniques appears with large state and action spaces, as it is the case of input information coming from the Robosoccer simulator. In this paper, we describe a new mechanism for solving the states generalization problem in reinforcement learning algorithms. This clustering mechanism is based on the vector quantization technique for signal analog-to-digital conversion and compression, and on the Generalized Lloyd Algorithm for the design of vector quantizers. Furthermore, we present the VQQL model, that integrates Q-Learning as reinforcement learning technique and vector quantization as state generalization technique. We show some results on applying this model to learning the interception task skill for Robosoccer agents.

[1]  Andrew W. Moore,et al.  Variable Resolution Dynamic Programming , 1991, ML Workshop.

[2]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[3]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[4]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[5]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[6]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[7]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[8]  Jacky Baltes,et al.  Path-Tracking Control of Non-Holonomic Car-Like Robot with Reinforcement Learning , 1999, New Zealand Computer Science Research Students' Conference.

[9]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[10]  Marco Dorigo,et al.  Message-Based Bucket Brigade: An Algorithm for the Apportionment of Credit Problem , 1991, EWSL.

[11]  Harry Wechsler,et al.  Reinforcement Algorithms Using Functional Approximation for Generalization and their Application to Cart Centering and Fractal Compression , 1999, IJCAI.

[12]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[13]  Long Ji Lin,et al.  Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.

[14]  Manuela M. Veloso,et al.  Team-Partitioned, Opaque-Transition Reinforced Learning , 1998, RoboCup.

[15]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[16]  Sebastian Thrun,et al.  Explanation Based Learning: A Comparison of Symbolic and Neural Network Approaches , 1993, ICML.

[17]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .