Approximation in Model-Based Learning

Model-based reinforcement learning, in which a model of the environment's dynamics is learned and used to supplement direct learning from experience, has been proposed as a general approach to learning and planning. We present experiments with this idea in which the model of the environment's dynamics is both approximate and learned on-line. These experiments involve the Mountain Car task, which requires approximation of both value function and model because it has continuous state variables. Naive model use is susceptible to modelling errors and can impair the value function. We show that excessive model use performs worse than using no model at all. Hybrid methods can mitigate learning with inherently incorrect models .

[1]  R.M. Dunn,et al.  Brains, behavior, and robotics , 1983, Proceedings of the IEEE.

[2]  W. T. Miller,et al.  CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.

[3]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4]  Andrew W. Moore,et al.  Acquisition of Dynamic Control Knowledge for a Robotic Manipulator , 1990, ML.

[5]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[6]  Long Ji Lin,et al.  Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.

[7]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[8]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[9]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[10]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[11]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[12]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[13]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[14]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[15]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.