论文信息 - The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces

Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that neither planning nor exploration occurs uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to efficiently and adaptively concentrate high resolution only on critical areas. The current version of the algorithm is designed to find feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to find a solution that optimizes a real-valued criterion. Many simulated problems have been rested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.

Andrew W. Moore | Christopher G. Atkeson | C. Atkeson | A. Moore

[1] G. D. Liveing,et al. The University of Cambridge , 1897, British medical journal.

[2] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[3] A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[4] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[5] G. Siouris,et al. Optimum systems control , 1979, Proceedings of the IEEE.

[6] Hendrik Van Brussel,et al. A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .

[7] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[8] Rodney A. Brooks,et al. A subdivision algorithm in configuration space for findpath with rotation , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[9] Larry S. Davis,et al. Multiresolution path planning for mobile robots , 1986, IEEE J. Robotics Autom..

[10] R. Hoppe. Multi-grid methods for Hamilton-Jacobi-Bellman equations , 1986 .

[11] Jean-Philippe Chancelier,et al. Dynamic programming complexity and application , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[12] Chee-Seng Chow,et al. Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems , 1989 .

[13] V. Rich. Personal communication , 1989, Nature.

[14] Stephen F. McCormick,et al. Multilevel adaptive methods for partial differential equations , 1989, Frontiers in applied mathematics.

[15] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[16] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[18] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[19] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[20] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[21] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[22] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.

[23] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[24] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[25] Timothy J. Purcell. Sorting and searching , 2005, SIGGRAPH Courses.

[26] G. Swaminathan. Robot Motion Planning , 2006 .