Risk-Sensitive Markov Decision Processes
暂无分享,去创建一个
This paper considers the maximization of certain equivalent reward generated by a Markov decision process with constant risk sensitivity. First, value iteration is used to optimize possibly time-varying processes of finite duration. Then a policy iteration procedure is developed to find the stationary policy with highest certain equivalent gain for the infinite duration case. A simple example demonstrates both procedures.
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .