On Minimizing Ordered Weighted Regrets in Multiobjective Markov Decision Processes

In this paper, we propose an exact solution method to generate fair policies in Multiobjective Markov Decision Processes (MMDPs). MMDPs consider n immediate reward functions, representing either individual payoffs in a multiagent problem or rewards with respect to different objectives. In this context, we focus on the determination of a policy that fairly shares regrets among agents or objectives, the regret being defined on each dimension as the opportunity loss with respect to optimal expected rewards. To this end, we propose to minimize the ordered weighted average of regrets (OWR). The OWR criterion indeed extends the minimax regret, relaxing egalitarianism for a milder notion of fairness. After showing that OWR-optimality is state-dependent and that the Bellman principle does not hold for OWR-optimal policies, we propose a linear programming reformulation of the problem. We also provide experimental results showing the efficiency of our approach.

[1]  D. White Multi-objective infinite-horizon discounted Markov decision processes , 1982 .

[2]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[3]  Thomas A. Henzinger,et al.  Markov Decision Processes with Multiple Objectives , 2006, STACS.

[4]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[5]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decision-making , 1988 .

[6]  Ralph E. Steuer Multiple criteria optimization , 1986 .

[7]  Marshall,et al.  [Springer Series in Statistics] Inequalities: Theory of Majorization and Its Applications || Matrix Theory , 2011 .

[8]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[9]  Bonifacio Llamazares Simple and absolute special majorities generated by OWA operators , 2004, Eur. J. Oper. Res..

[10]  Pierre Hansen,et al.  Bicriterion Path Problems , 1980 .

[11]  Wlodzimierz Ogryczak,et al.  On solving linear programs with the ordered weighted averaging objective , 2003, Eur. J. Oper. Res..

[12]  E. Altman Constrained Markov Decision Processes , 1999 .

[13]  Simon French,et al.  Multiple Criteria Decision Making: Theory and Application , 1981 .

[14]  Ronald R. Yager,et al.  Decision making using minimization of regret , 2004, Int. J. Approx. Reason..

[15]  Abdel-Illah Mouaddib Multi-objective decision-theoretic path planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[16]  J. Desrosiers,et al.  A Primer in Column Generation , 2005 .

[17]  Adam Wierzbicki,et al.  Equitable aggregations and multiple criteria analysis , 2004, Eur. J. Oper. Res..

[18]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[19]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[20]  J. H. B. Kemperman,et al.  Review: Albert W. Marshall and Ingram Olkin, Inequalities: Theory of majorization and its applications, and Y. L. Tong, Probability inequalities in multivariate distributions , 1981 .

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  A. M. Geoffrion Proper efficiency and the theory of vector maximization , 1968 .