Dynamic Weights in Multi-Objective Deep Reinforcement Learning

Many real world decision problems are characterized by multiple conflicting objectives which must be balanced based on their relative importance. In the dynamic weights setting the relative importance changes over time and specialized algorithms that deal with such change, such as the tabular Reinforcement Learning (RL) algorithm by Natarajan & Tadepalli (2005), are required. However, this earlier work is not feasible for RL settings that necessitate the use of function approximators. We generalize across weight changes and high-dimensional inputs by proposing a multi-objective Q-network whose outputs are conditioned on the relative importance of objectives, and introduce Diverse Experience Replay (DER) to counter the inherent non-stationarity of the dynamic weights setting. We perform an extensive experimental evaluation and compare our methods to adapted algorithms from Deep Multi-Task/Multi-Objective RL and show that our proposed network in combination with DER dominates these adapted algorithms across weight change scenarios and problem domains.

[1]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[2]  Yoshua Bengio,et al.  Universal Successor Representations for Transfer Reinforcement Learning , 2018, ICLR.

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[5]  Tom Schaul,et al.  Universal Successor Features Approximators , 2018, ICLR.

[6]  Olac Fuentes,et al.  Knowledge Transfer in Deep convolutional Neural Nets , 2007, Int. J. Artif. Intell. Tools.

[7]  Fang Zhang,et al.  Combining Deep Reinforcement Learning and Safety Based Control for Autonomous Driving , 2016, ArXiv.

[8]  Matthew E. Taylor,et al.  Initial Progress in Transfer for Deep Reinforcement Learning Algorithms , 2016 .

[9]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10]  Robert Babuska,et al.  Experience Selection in Deep Reinforcement Learning for Control , 2018, J. Mach. Learn. Res..

[11]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[12]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[13]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[14]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[15]  DebK.,et al.  A fast and elitist multiobjective genetic algorithm , 2002 .

[16]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[17]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[18]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[19]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[20]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[21]  Christoph Zauner,et al.  Implementation and Benchmarking of Perceptual Image Hash Functions , 2010 .

[22]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[23]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[24]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[25]  Shimon Whiteson,et al.  Multi-Objective Decision Making , 2017, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[26]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[27]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[28]  Joseph A. Paradiso,et al.  The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[29]  Shimon Whiteson,et al.  Multi-Objective Deep Reinforcement Learning , 2016, ArXiv.

[30]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[31]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[32]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[33]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.