Markov Decision Processes with Sample Path Constraints

We consider time-average Markov Decision Processes MDPs, which accumulate a reward and cost at each decision epoch. A policy meets the sample-path constraint if the time-average cost is below a spe...