In this chapter, we introduce profit sharing method (Grefenstette, 1988) (Miyazaki et al., 1994a) which is a reinforcement learning method. Profit sharing can work well on the partially observable Markov decision process (POMDP) where a learning agent cannot distinguish an observation between states which need another action, because it is a typical non-bootstrap method, and its Q-value is usually handled accumulatively. So we study profit sharing as the next generation reinforcement learning system. First we discuss how to assign the credit to a rule on POMDP. The conventional reinforcement function of profit sharing does not consider POMDP. So we propose a novel credit assignment which considers the condition of the reward distribution on POMDP. Secondly, we discuss the probabilistic state transition on MDP. Profit sharing does not work well on the probabilistic state transition. We propose a novel learning method which considers the probabilistic state transition. It is similar to the Monte Carlo method. We therefore discuss the Q-values of our proposed method. In an environment with deterministic state transitions, we show the same performance for both conventional profit sharing and the proposed method. We also show the good performance of the proposed method against the conventional profit sharing. In this chapter, we discuss the learning in POMDP and the probabilistic state transition. We show the advantages and disadvantages of the profit sharing method. We propose a novel learning method which has the same advantages and solves the disadvantages. We propose how to handle the Q-values in an action-selection. Section 2 introduces the conventional reinforcement learning methods and profit sharing method. We propose the novel learning method in Section 3. Section 4 shows the results and finally Section 5 concludes this chapter.
[1]
John J. Grefenstette,et al.
Credit assignment in rule discovery systems based on genetic algorithms
,
1988,
Machine Learning.
[2]
Peter Dayan,et al.
Technical Note: Q-Learning
,
2004,
Machine Learning.
[3]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[4]
Dana H. Ballard,et al.
Active Perception and Reinforcement Learning
,
1990,
Neural Computation.
[5]
W. Uemura.
About distributing rewards to a rule with probabilistic state transition
,
2007,
SICE Annual Conference 2007.
[6]
Richard S. Sutton,et al.
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
,
1990,
ML.