论文信息 - A Policy Representation Using Weighted Multiple Normal Distribution

A Policy Representation Using Weighted Multiple Normal Distribution

In this paper, we challenge to solve a reinforcement learning problem for a 5-linked ring robot within a real-time so that the real-robot can stand up to the trial and error. On this robot, incomplete perception problems are caused from noisy sensors and cheap position-control motor systems. This incomplete perception also causes varying optimum actions with the progress of the learning. To cope with this problem, we adopt an actor-critic method, and we propose a new hierarchical policy representation scheme, that consists of discrete action selection on the top level and continuous action selection on the low level of the hierarchy. The proposed hierarchical scheme accelerates learning on continuous action space, and it can pursue the optimum actions varying with the progress of learning on our robotics problem. This paper compares and discusses several learning algorithms through simulations, and demonstrates the proposed method showing application for the real robot.

[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[4] Shigenobu Kobayashi,et al. An Actor-Critic Algorithm Using a Binary Tree Action Selector , 2001 .

[5] Jun Morimoto,et al. Acquisition of Stand-up Behavior by a 3-link 2-joint Robot using Hierarchical Reinforcement Learning , 2001 .

[6] Mitsuo Kawato,et al. MOSAIC Reinforcement Learning Architecture: Symbolization by Predictability and Mimic Learning by Symbol , 2001 .

[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8] Osamu Katai,et al. Fuzzy Interpolation-Based Q-Learning with Continuous Inputs and Outputs , 1999 .

[9] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[10] Cheng-Jian Lin,et al. An ART-based fuzzy adaptive learning control network , 1994, NAFIPS/IFIS/NASA '94. Proceedings of the First International Joint Conference of The North American Fuzzy Information Processing Society Biannual Conference. The Industrial Fuzzy Control and Intellige.