On Determinism Handling While Learning Reduced State Space Representations

When applying a Reinforcement Learning technique to problems with continuous or very large state spaces, some kind of generalization is required. In the bibliography, two main approaches can be found. On one hand, the generalization problem can be defined as an approximation problem of the continuous value function, typically solved with neural networks. On the other hand, other approaches discretize or cluster the states of the original state space to achieve a reduced one in order to learn a discrete value table. However, both methods have disadvantages, like the introduction of non-determinism in the discretizations, parameters hard to tune by the user, or the use of a high number of resources. In this paper, we use some characteristics of both approaches to achieve state space representations that allow to approximate the value function in deterministic reinforcement learning problems. The method clusters the domain supervised by the value function being learned to avoid the non-determinism introduction. At the same time, the size of the new representation stays small and it is automatically computed. Experiments show improvements over other approaches such as uniform or unsupervised clustering.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  J. Winn,et al.  Brain , 1878, The Lancet.

[3]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[4]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[5]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[6]  C. Watkins Learning from delayed rewards , 1989 .

[7]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[8]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[9]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[10]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[11]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[12]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[13]  Fernando Fernández,et al.  VQQL. Applying Vector Quantization to Reinforcement Learning , 1999, RoboCup.

[14]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[15]  Pedro Isasi Viñuela,et al.  Designing nearest neighbour classifiers by the evolution of a population of prototypes , 2001, ESANN.

[16]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[17]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.