论文信息 - A study on abstract policy for acceleration of reinforcement learning

A study on abstract policy for acceleration of reinforcement learning

Reinforcement learning (RL) is well known as one of the methods that can be applied to unknown problems. However, because optimization at every state requires trial-and-error, the learning time becomes large when environment has many states. If there exist solutions to similar problems and they are used during the exploration, some of trial-and-error can be spared and the learning can take a shorter time. In this paper, the authors propose to reuse an abstract policy, a representative of a solution constructed by learning vector quantization (LVQ) algorithm, to improve initial performance of an RL learner in a similar but different problem. Furthermore, it is investigated whether or not the policy can adapt to a new environment while preserving its performance in the old environments. Simulations show good result in terms of the learning acceleration and the adaptation of abstract policy.

Junichi Murata | Hirotaka Takano | Ahmad Afif Mohd Faudzi

[1] James L. Carroll,et al. Fixed vs. Dynamic Sub-Transfer in Reinforcement Learning , 2002, ICMLA.

[2] Junichi Murata,et al. A study on visual abstraction for reinforcement learning problem using Learning Vector Quantization , 2013, The SICE Annual Conference 2013.

[3] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Junichi Murata,et al. A study on use of prior information for acceleration of reinforcement learning , 2011, SICE Annual Conference 2011.

[6] Kunikazu Kobayashi,et al. Adaptive swarm behavior acquisition by a neuro-fuzzy system and reinforcement learning algorithm , 2009, Int. J. Intell. Comput. Cybern..

[7] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[8] Teuvo Kohonen,et al. Learning vector quantization , 1998 .

[9] Fernando Fernández,et al. Strategies for simulating pedestrian navigation with multiple reinforcement learning agents , 2014, Autonomous Agents and Multi-Agent Systems.

[10] Manfred Huber,et al. Learning to generalize and reuse skills using approximate partial policy homomorphisms , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.