A reinforcement learning method using a dynamic reinforcement function based on action selection probability
暂无分享,去创建一个
In this paper, the authors propose Dynamic Profit Sharing as a reinforcement learning method in which a reinforcement function in Profit Sharing (PS) is dynamically changed based on action selection probabilities. While the rationality theorem in Profit Sharing gives a necessary and sufficient condition for obtaining rational solutions [1], the proposed method gives a condition for improving the learning efficiency while stochastically maintaining sufficient rationality. By dynamically determining the reinforcement function that satisfies this condition, the reward distribution efficiency can be increased and learning can be accomplished quickly even for an environment in which a great many actions are required until the goal state is reached. The authors perform experiments using maze and pursuit problems as examples to verify the effectiveness of the proposed method. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(7): 1– 11, 2007; Published online in Wiley InterScience (). DOI 10.1002sscj.20738