R EINFORCEMENT L EARNING WITH U NKNOWN R EWARD F UNCTIONS

In practical reinforcement learning (RL) scenarios, algorithm designers might express uncertainty over which reward function best captures real-world desiderata. However, academic papers typically treat the reward function as either (i) exactly known, leading to the standard reinforcement learning problem, or (ii) unknown, motivating a body of work on intrinsically-motivated exploration, where agents learn the dynamics of their environment and visit diverse states, often as a pretraining step to task-specific learning. We propose a framework for reinforcement learning given a distribution over possible reward functions. Our contributions include derivations of the Bayes-optimal and minimax policies in this setting as well as efficient algorithms for approximating these policies.

[1]  Christian R. Shelton,et al.  Balancing Multiple Sources of Reward in Reinforcement Learning , 2000, NIPS.

[2]  Anestis Fachantidis,et al.  Knowledge transfer in reinforcement learning , 2016 .

[3]  Eric van Damme,et al.  Non-Cooperative Games , 2000 .

[4]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[5]  Shimon Whiteson,et al.  Learning potential functions and their representations for multi-task reinforcement learning , 2013, Autonomous Agents and Multi-Agent Systems.

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[8]  Girish Chowdhary,et al.  Robust Deep Reinforcement Learning with Adversarial Attacks , 2017, AAMAS.

[9]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[10]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[11]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[12]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[13]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[14]  G. Brown SOME NOTES ON COMPUTATION OF GAMES SOLUTIONS , 1949 .

[15]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[17]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[18]  Manfred Huber,et al.  Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies , 2007, IJCAI.

[19]  Laurent Orseau,et al.  Measuring and avoiding side effects using relative reachability , 2018, ArXiv.

[20]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[21]  Kee-Eung Kim,et al.  Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions , 2012, NIPS.

[22]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[23]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[24]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[25]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[26]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[27]  Peter Dayan,et al.  Structure in the Space of Value Functions , 2002, Machine Learning.

[28]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[29]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[30]  Sergey Levine,et al.  Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[31]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[32]  Constantinos Daskalakis,et al.  A Counter-example to Karlin's Strong Conjecture for Fictitious Play , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[33]  Daniele Calandriello,et al.  Sparse multi-task reinforcement learning , 2014, Intelligenza Artificiale.

[34]  Thomas J. Walsh Transferring State Abstractions Between MDPs , 2006 .