Applying Online Search Techniques to Reinforcement Learning

In reinforcement learning it is frequently necessary to resort to an approximation to the true optimal value function. Here we investigate the bene ts of online search in such cases. We examine \local" searches, where the agent performs a nite-depth lookahead search, and \global" searches, where the agent performs a search for a trajectory all the way from the current state to a goal state. The key to the success of these methods lies in taking a value function, which gives a rough solution to the hard problem of nding good trajectories from every single state, and combining that with online search, which then gives an accurate solution to the easier problem of nding a good trajectory speci cally from the current state.

[1]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[2]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[3]  Jean-Claude Latombe,et al.  Robot motion planning , 1970, The Kluwer international series in engineering and computer science.

[4]  Peter Norvig,et al.  A modern, agent-oriented approach to introductory artificial intelligence , 1995, SGAR.

[5]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[6]  Scott Davies,et al.  Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[7]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[8]  Gary Boone,et al.  Minimum-time control of the Acrobot , 1997, Proceedings of International Conference on Robotics and Automation.