Markov reinforcement learning driven by utility

This paper puts forward an extended model of Q learning and discusses a utility-drive Markov reinforcement learning.Compared with learning algorithm with single absorbed states,the learning target is not a state but to maximize the averaged utilities of agent in each decision process.The learning result is always a circle which lets agent acquire maximal rewards.Convergence of Q-learning is proved and the simulations in image grids indicates the learning result is a circle.