The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces

Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that neither planning nor exploration occurs uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to efficiently and adaptively concentrate high resolution only on critical areas. The current version of the algorithm is designed to find feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to find a solution that optimizes a real-valued criterion. Many simulated problems have been tested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[3]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[4]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[5]  G. Siouris,et al.  Optimum systems control , 1979, Proceedings of the IEEE.

[6]  Hendrik Van Brussel,et al.  A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .

[7]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[9]  Rodney A. Brooks,et al.  A subdivision algorithm in configuration space for findpath with rotation , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Larry S. Davis,et al.  Multiresolution path planning for mobile robots , 1986, IEEE J. Robotics Autom..

[11]  R. Hoppe Multi-grid methods for Hamilton-Jacobi-Bellman equations , 1986 .

[12]  A. Ramsay Formal Methods in Artificial Intelligence , 1988 .

[13]  Jean-Philippe Chancelier,et al.  Dynamic programming complexity and application , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[14]  Joseph E. Flaherty,et al.  Adaptive Methods for Partial Differential Equations , 1989 .

[15]  C. Watkins Learning from delayed rewards , 1989 .

[16]  Chows Chee-Seng Multigrid algorithms and complexity results for discrete-time stochastic control and related fixed-point problems , 1989 .

[17]  Stephen F. McCormick,et al.  Multilevel adaptive methods for partial differential equations , 1989, Frontiers in applied mathematics.

[18]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[19]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[20]  A. Moore Variable Resolution Dynamic Programming , 1991, ML.

[21]  Jean-Claude Latombe,et al.  Robot motion planning , 1970, The Kluwer international series in engineering and computer science.

[22]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[23]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[24]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[25]  Stefan Schaal,et al.  Assessing the Quality of Learned Local Models , 1993, NIPS.

[26]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[27]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[28]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[29]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[30]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.