Markov Decision Processes with Sample Path Constraints
暂无分享,去创建一个
We consider time-average Markov Decision Processes MDPs, which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a spe...