About Q-values of Monte Carlo method

Profit sharing method is one of the reinforcement learning methods. Profit sharing can work well on the partially observable Markov decision processes (POMDPs). Because it is the typical non-bootstrap method, and itpsilas Q-value is usually handled accumulative. Profit sharing, however, does not work well on the probabilistic state transition. This paper we propose the novel learning method which can work well on the probabilistic state transition. It is similar to the Monte Carlo method. So we discuss about Q-values of our proposed method. In the environment with deterministic state transitions, we show the same performance both the conventional profit sharing and proposed method. And show the good performance of proposed method against the conventional profit sharing.