Applying Online Search Techniques to Continuous-State Reinforcement Learning

In this paper, we describe methods for efficiently computing better solutions to control problems in continuous state spaces. We provide algorithms that exploit online search to boost the power of very approximate value functions discovered by traditional reinforcement learning techniques. We examine local searches, where the agent performs a finite-depth lookahead search, and global searches, where the agent performs a search for a trajectory all the way from the current state to a goal state. The key to the success of the local methods lies in taking a value function, which gives a rough solution to the hard problem of finding good trajectories from every single state, and combining that with online search, which then gives an accurate solution to the easier problem of finding a good trajectory specifically from the current state. The key to the success of the global methods lies in using aggressive state-space search techniques such as uniform-cost search and A*, tamed into a tractable form by exploiting neighborhood relations and trajectory constraints that arise from continuous-space dynamic control.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  E. Feigenbaum,et al.  Computers and Thought , 1963 .

[3]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[4]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[5]  Alexander Graham,et al.  Introduction to Control Theory, Including Optimal Control , 1980 .

[6]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Christopher G. Atkeson,et al.  Using Local Models to Control Movement , 1989, NIPS.

[8]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[9]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[13]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[14]  Scott Davies,et al.  Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[15]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[16]  Gary Boone,et al.  Minimum-time control of the Acrobot , 1997, Proceedings of International Conference on Robotics and Automation.

[17]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..