Multi-objectivization of reinforcement learning problems by reward shaping

Multi-objectivization is the process of transforming a single objective problem into a multi-objective problem. Research in evolutionary optimization has demonstrated that the addition of objectives that are correlated with the original objective can make the resulting problem easier to solve compared to the original single-objective problem. In this paper we investigate the multi-objectivization of reinforcement learning problems. We propose a novel method for the multi-objectivization of Markov Decision problems through the use of multiple reward shaping functions. Reward shaping is a technique to speed up reinforcement learning by including additional heuristic knowledge in the reward signal. The resulting composite reward signal is expected to be more informative during learning, leading the learner to identify good actions more quickly. Good reward shaping functions are by definition correlated with the target value function for the base reward signal, and we show in this paper that adding several correlated signals can help to solve the basic single objective problem faster and better. We prove that the total ordering of solutions, and by consequence the optimality of solutions, is preserved in this process, and empirically demonstrate the usefulness of this approach on two reinforcement learning tasks: a pathfinding problem and the Mario domain.

[1]  Mikkel T. Jensen,et al.  Helper-objectives: Using multi-objective evolutionary algorithms for single-objective optimisation , 2004, J. Math. Model. Algorithms.

[2]  Julian Togelius,et al.  The Mario AI Benchmark and Competitions , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[3]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[4]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[5]  Jonathan E. Fieldsend,et al.  Optimizing Decision Trees Using Multi-objective Particle Swarm Optimization , 2009 .

[6]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[7]  Ann Nowé,et al.  Scalarized multi-objective reinforcement learning: Novel design techniques , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[8]  Xiaodong Li,et al.  Evolutionary algorithms and multi-objectivization for the travelling salesman problem , 2009, GECCO.

[9]  Csaba Szepesvári,et al.  Multi-criteria Reinforcement Learning , 1998, ICML.

[10]  J. Dennis,et al.  A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems , 1997 .

[11]  Frank Neumann,et al.  Do additional objectives make a problem harder? , 2007, GECCO '07.

[12]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[13]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[14]  V. G. Zhadan,et al.  Exact auxiliary functions in optimization problems , 1991 .

[15]  Richard A. Watson,et al.  Reducing Local Optima in Single-Objective Problems by Multi-objectivization , 2001, EMO.

[16]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Sushil J. Louis,et al.  Pareto Optimality , GA-easiness and Deception , 2012 .

[19]  Joshua D. Knowles,et al.  Multiobjectivization by Decomposition of Scalar Cost Functions , 2008, PPSN.

[20]  Matthew E. Taylor,et al.  Adaptive objective selection for correlated objectives in multi-objective reinforcement learning , 2014, AAMAS.

[21]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Daniel Kudenko,et al.  Using plan-based reward shaping to learn strategies in StarCraft: Broodwar , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[23]  Kazutoshi Sakakibara,et al.  Multi-objective approaches in a single-objective optimization environment , 2005, 2005 IEEE Congress on Evolutionary Computation.

[24]  Sam Devlin,et al.  An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..

[25]  Arina Buzdalova,et al.  Generation of Tests for Programming Challenge Tasks Using Helper-Objectives , 2013, SSBSE.

[26]  Carlos A. Coello Coello,et al.  Swarm Intelligence for Multi-objective Problems in Data Mining , 2009 .

[27]  Kalyanmoy Deb,et al.  Trading on infeasibility by exploiting constraint’s criticality through multi-objectivization: A system design perspective , 2007, 2007 IEEE Congress on Evolutionary Computation.

[28]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[29]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[30]  Sushil J. Louis,et al.  Pareto OptimalityGA-Easiness and Deception (Extended Abstract) , 1993, International Conference on Genetic Algorithms.

[31]  Arina Buzdalova,et al.  Increasing Efficiency of Evolutionary Algorithms by Choosing between Auxiliary Fitness Functions with Reinforcement Learning , 2012, 2012 11th International Conference on Machine Learning and Applications.

[32]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[33]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[34]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[35]  Michael L. Littman,et al.  An Ensemble of Linearly Combined Reinforcement-Learning Agents , 2013, AAAI.