Two steps reinforcement learning

When applying reinforcement learning in domains with very large or continuous state spaces, the experience obtained by the learning agent in the interaction with the environment must be generalized. The generalization methods are usually based on the approximation of the value functions used to compute the action policy and tackled in two different ways. On the one hand by using an approximation of the value functions based on a supervized learning method. On the other hand, by discretizing the environment to use a tabular representation of the value functions. In this work, we propose an algorithm that uses both approaches to use the benefits of both mechanisms, allowing a higher performance. The approach is based on two learning phases. In the first one, a learner is used as a supervized function approximator, but using a machine learning technique which also outputs a state space discretization of the environment, such as nearest prototype classifiers or decision trees do. In the second learning phase, the space discretization computed in the first phase is used to obtain a tabular representation of the value function computed in the previous phase, allowing a tuning of such value function approximation. Experiments in different domains show that executing both learning phases improves the results obtained executing only the first one. The results take into account the resources used and the performance of the learned behavior. © 2008 Wiley Periodicals, Inc.

[1]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[2]  A. Moore Variable Resolution Dynamic Programming , 1991, ML.

[3]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[4]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[5]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[6]  C. Anderson,et al.  Multigrid Q-Learning , 1994 .

[7]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[8]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[9]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Hiroaki Kitano,et al.  The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[12]  Claude F. Touzet,et al.  Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..

[13]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[14]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[15]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[16]  Fernando Fernández,et al.  VQQL. Applying Vector Quantization to Reinforcement Learning , 1999, RoboCup.

[17]  Stuart I. Reynolds,et al.  Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning , 2000, International Conference on Machine Learning.

[18]  H. Kushner Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .

[19]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[20]  Hiroaki Kitano,et al.  RoboCup-99: Robot Soccer World Cup III , 2003, Lecture Notes in Computer Science.

[21]  Learning in Large Cooperative Multi-Robot Domains , 2001 .

[22]  Stuart I. Reynolds Reinforcement Learning with Exploration , 2002 .

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[24]  Pedro Isasi Viñuela,et al.  Automatic Finding of Good Classifiers Following a Biologically Inspired Metaphor , 2002, Comput. Artif. Intell..

[25]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[26]  Manuela Veloso,et al.  Tree based hierarchical reinforcement learning , 2002 .

[27]  Lynne E. Parker,et al.  Distributed Algorithms for Multi-Robot Observation of Multiple Moving Targets , 2002, Auton. Robots.

[28]  Stuart J. Russell,et al.  Reinforcement learning for autonomous vehicles , 2002 .

[29]  Fernando Fernández,et al.  On Determinism Handling While Learning Reduced State Space Representations , 2002, ECAI.

[30]  Toshikazu Wada,et al.  K-d decision tree: an accelerated and memory efficient nearest neighbor classifier , 2003, Third IEEE International Conference on Data Mining.

[31]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[32]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[33]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[34]  Kevin D. Seppi,et al.  Variable resolution discretization in the joint space , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[35]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[36]  Rémi Munos,et al.  A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.

[37]  Fernando Fernández,et al.  Evolutionary Design of Nearest Prototype Classifiers , 2004, J. Heuristics.

[38]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[39]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[40]  Lynne E. Parker,et al.  A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains , 2005, J. Intell. Robotic Syst..

[41]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42]  Kurt Driessens,et al.  Relational Reinforcement Learning , 1998, Machine-mediated learning.

[43]  Toshikazu Wada,et al.  K-D Decision Tree: An Accelerated and Memory Efficient Nearest Neighbor Classifier , 2010 .