Efficient Methods for Multi-Objective Decision-Theoretic Planning

In decision-theoretic planning problems, such as (partially observable) Markov decision problems or coordination graphs, agents typically aim to optimize a scalar value function. However, in many real-world problems agents are faced with multiple possibly conflicting objectives. In such multi-objective problems, the value is a vector rather than a scalar, and we need methods that compute a coverage set, i.e., a set of solutions optimal for all possible trade-offs between the objectives. In this project propose new multi-objective planning methods that compute the so-called convex coverage set (CCS): the coverage set for when policies can be stochastic, or the preferences are linear. We show that the CCS has favorable mathematical properties, and is typically much easier to compute that the Pareto front, which is often axiomatically assumed as the solution set for multi-objective decision problems

[1]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[2]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[3]  Shimon Whiteson,et al.  Bounded Approximations for Linear Multi-Objective Planning Under Uncertainty , 2014, ICAPS.

[4]  Leslie Pack Kaelbling,et al.  Algorithms for Partially Observable Markov Decision Processes , 1994 .

[5]  Shimon Whiteson,et al.  Linear support for multi-objective coordination graphs , 2014, AAMAS.

[6]  Shimon Whiteson,et al.  Computing Convex Coverage Sets for Multi-objective Coordination Graphs , 2013, ADT.

[7]  Shimon Whiteson,et al.  Point-Based Planning for Multi-Objective POMDPs , 2015, IJCAI.

[8]  R. Lathe Phd by thesis , 1988, Nature.

[9]  Andrei V. Kelarev,et al.  Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks , 2009, Australasian Conference on Artificial Intelligence.

[10]  Zina M. Ibrahim,et al.  Advances in Artificial Intelligence , 2003, Lecture Notes in Computer Science.

[11]  Frans A. Oliehoek,et al.  Dec-POMDPs as Non-Observable MDPs , 2014 .

[12]  Suzana Dragicevic,et al.  GIS and Intelligent Agents for Multiobjective Natural Resource Allocation: A Reinforcement Learning Approach , 2009, Trans. GIS.

[13]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[14]  Hsien-Te Cheng,et al.  Algorithms for partially observable markov decision processes , 1989 .

[15]  Shimon Whiteson,et al.  Computing Convex Coverage Sets for Faster Multi-objective Coordination , 2015, J. Artif. Intell. Res..