Weighted reward criteria in Competitive Markov Decision Processes
暂无分享,去创建一个
We consider Competitive Markov Decision Processes in which the controllers/players are antagonistic and aggregate their sequences of expected rewards according to “weighted” or “horizonsensitive” criteria. These are either a convex combination of two discounted objectives, or of one discounted and one limiting average reward objective. In both cases we establish the existence of the game-theoretic value vector, and supply a description of 6-optimal non-stationary strategies.
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] Dean Gillette,et al. 9. STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES , 1958 .
[3] D. Blackwell,et al. THE BIG MATCH , 1968, Classics in Game Theory.
[4] A. Federgruen. On N-person stochastic games by denumerable state space , 1978, Advances in Applied Probability.
[5] Jerzy A. Filar,et al. A Weighted Markov Decision Process , 1992, Oper. Res..