论文信息 - About an initial value of Q-value in Profit Sharing

About an initial value of Q-value in Profit Sharing

A profit sharing method that is one of the reinforcement learning methods distributes the reward to Q-values of rules. A Q-value of a profit sharing method that is used at the action selection has the received value of its rule. In this paper, we discuss an initial value of Q-value and propose the setting method for the initial value of Q-value. If the initial value is too large than the distribution value, the action selection becomes always random selection. If the initial value is too small, the action selection outputs only one action that learned at a beginner. For resolving these problems, we must set the non-problem value at each state. So we propose the Q-value setting method for the initial value of Q-value at each state. The experiment shows that this method is better than the conventional method

W. Uemura | A. Ueno | S. Tatsumi

[1] Shoji Tatsumi,et al. About the Reinforcement Function for Profit Sharing , 2004 .

[2] John J. Grefenstette,et al. Credit assignment in rule discovery systems based on genetic algorithms , 1988, Machine Learning.

[3] Uemura Wataru,et al. SAPS: The Exploitation Reinforcement Learning Method on POMDPs , 2004 .

[4] Kwang Soon Lee,et al. Successive Linearization-based Repetitive Control of Simulated Moving Bed Process , 2006, 2006 SICE-ICASE International Joint Conference.

[5] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[6] K.S. Lee,et al. Model Predictive Control of Condensate Recycle Process in a Cogeneration Power Station , 2007, 2007 American Control Conference.

[7] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[8] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.