Adaptive state space partitioning for reinforcement learning

The convergence property of reinforcement learning has been extensively investigated in the field of machine learning, however, its applications to real-world problems are still constrained due to its computational complexity. A novel algorithm to improve the applicability and efficacy of reinforcement learning algorithms via adaptive state space partitioning is presented. The proposed temporal difference learning with adaptive vector quantization (TD-AVQ) is an online algorithm and does not assume any a priori knowledge with respect to the learning task and environment. It utilizes the information generated from the reinforcement learning algorithms. Therefore, no additional computations on the decisions of how to partition a particular state space are required. A series of simulations are provided to demonstrate the practical values and performance of the proposed algorithms in solving robot motion planning problems.

[1]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[2]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[3]  L. Rabiner,et al.  The acoustics, speech, and signal processing society - A historical perspective , 1984, IEEE ASSP Magazine.

[4]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5]  Hyk Lau,et al.  Assembly skill acquisition and model extraction via reinforcement learning , 2000 .

[6]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[8]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[9]  Martin A. Riedmiller,et al.  Application of sequential reinforcement learning to control dynamic systems , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[10]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[11]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[12]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[13]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[14]  H.Y.K. Lau,et al.  Assembly skill acquisition via reinforcement learning , 2001 .

[15]  P. Dayan,et al.  TD(λ) converges with probability 1 , 2004, Machine Learning.

[16]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[17]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[18]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[19]  Terrence J. Sejnowski,et al.  TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[20]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21]  K. L. Mak,et al.  ADAPTIVE VECTOR QUANTIZATION FOR REINFORCEMENT LEARNING , 2002 .

[22]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.