Reinforcement learning based on a statistical value function and its application to a board game
暂无分享,去创建一个
A statistical method for reinforcement learning is proposed to cope with a large number of discrete states. As a coarse-graining of a large number of states, less number of sets of states are defined as a group of neighbouring states. State sets partly overlap, and one state is included in a multiple sets. The learning is based on an action-value function for each state set, and an action-value function on an individual state is derived by a statistical average of multiple value functions on state sets. The proposed method is applied to a board game Dots-and-Boxes. Simulations show a successful learning through the training games competing with a mini-max method of the search depth 2 to 5, and the winning rate against a depth-3 mini-max attains about 80%. An action-value function derived by a weighted average with the weight given by the variance of rewards shows the advantage compared with the one derived by a simple average.
[1] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.