Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case

The problem analyzed here is the computation of Pareto optima in the sense of high mean and low variance of the stationary distribution in the unichain, undiscounted Markov decision process MDP, for short.