Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks

Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user's preferences than can be achieved by methods based strictly on deterministic policies.

[1]  Kennan T. Smith,et al.  Linear Topological Spaces , 1966 .

[2]  K. Brown,et al.  Graduate Texts in Mathematics , 1982 .

[3]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[4]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[5]  Carlos A. Coello Coello,et al.  Handling preferences in evolutionary multiobjective optimization: a survey , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[6]  Christian R. Shelton,et al.  Importance sampling for reinforcement learning with multiple objectives , 2001 .

[7]  Shie Mannor,et al.  The Steering Approach for Multi-Criteria Reinforcement Learning , 2001, NIPS.

[8]  Andrea Castelletti,et al.  Reinforcement learning in the operational management of a water system , 2002 .

[9]  Kemper Lewis,et al.  Intuitive visualization of Pareto Frontier for multi-objective optimization in n-dimensional performance space , 2004 .

[10]  Raimund Seidel Convex Hull Computations , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[11]  Jennie Si,et al.  Hierarchical Approaches to Concurrency, Multiagency, and Partial Observability , 2004 .

[12]  Shie Mannor,et al.  A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[13]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[14]  David Levine,et al.  Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning , 2007, NIPS.

[15]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[16]  Zili Zhang,et al.  A generalized joint inference approach for citation matching , 2008 .

[17]  John Yearwood,et al.  On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts , 2008, Australasian Conference on Artificial Intelligence.