Markov Decision Processes with Multiple Objectives

We consider Markov decision processes (MDPs) with multiple discounted reward objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the different objectives are characterized by the Pareto curve. We show that every Pareto-optimal point can be achieved by a memoryless strategy; however, unlike in the single-objective case, the memoryless strategy may require randomization. Moreover, we show that the Pareto curve can be approximated in polynomial time in the size of the MDP. Additionally, we study the problem if a given value vector is realizable by any strategy, and show that it can be decided in polynomial time; but the question whether it is realizable by a deterministic memoryless strategy is NP-complete. These results provide efficient algorithms for design exploration in MDP models with multiple objectives.

[1]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[2]  Krzysztof Kuchcinski,et al.  Time-energy design space exploration for multi-layer memory architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[3]  Mihalis Yannakakis,et al.  On the approximability of trade-offs and optimal access of Web sources , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  Juhani Koski,et al.  Multicriteria Truss Optimization , 1988 .

[5]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[6]  Oren Etzioni,et al.  Efficient information gathering on the Internet , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  D. White Multi-objective infinite-horizon discounted Markov decision processes , 1982 .

[9]  Francky Catthoor,et al.  Pareto-optimization-based run-time task scheduling for embedded systems , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).