论文信息 - R EINFORCEMENT L EARNING WITH U NKNOWN R EWARD F UNCTIONS - 字舞流文

R EINFORCEMENT L EARNING WITH U NKNOWN R EWARD F UNCTIONS

In practical reinforcement learning (RL) scenarios, algorithm designers might express uncertainty over which reward function best captures real-world desiderata. However, academic papers typically treat the reward function as either (i) exactly known, leading to the standard reinforcement learning problem, or (ii) unknown, motivating a body of work on intrinsically-motivated exploration, where agents learn the dynamics of their environment and visit diverse states, often as a pretraining step to task-specific learning. We propose a framework for reinforcement learning given a distribution over possible reward functions. Our contributions include derivations of the Bayes-optimal and minimax policies in this setting as well as efficient algorithms for approximating these policies.

R. Salakhutdinov | Benjamin Eysenbach | Jacob Tyo | Shane Gu | Zachary Lipton | Sergey Levine

[1] Christian R. Shelton,et al. Balancing Multiple Sources of Reward in Reinforcement Learning , 2000, NIPS.

[2] Anestis Fachantidis,et al. Knowledge transfer in reinforcement learning , 2016 .

[3] Eric van Damme,et al. Non-Cooperative Games , 2000 .

[4] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[5] Shimon Whiteson,et al. Learning potential functions and their representations for multi-task reinforcement learning , 2013, Autonomous Agents and Multi-Agent Systems.

[6] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[7] Sriraam Natarajan,et al. Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[8] Girish Chowdhary,et al. Robust Deep Reinforcement Learning with Adversarial Attacks , 2017, AAMAS.

[9] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[10] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[11] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[12] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.

[13] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[14] G. Brown. SOME NOTES ON COMPUTATION OF GAMES SOLUTIONS , 1949 .

[15] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[17] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[18] Manfred Huber,et al. Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies , 2007, IJCAI.

[19] Laurent Orseau,et al. Measuring and avoiding side effects using relative reachability , 2018, ArXiv.

[20] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[21] Kee-Eung Kim,et al. Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions , 2012, NIPS.

[22] Balaraman Ravindran,et al. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[23] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[24] Manuela M. Veloso,et al. Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[25] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.

[26] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[27] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.

[28] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .

[29] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[30] Sergey Levine,et al. Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[31] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[32] Constantinos Daskalakis,et al. A Counter-example to Karlin's Strong Conjecture for Fictitious Play , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[33] Daniele Calandriello,et al. Sparse multi-task reinforcement learning , 2014, Intelligenza Artificiale.

[34] Thomas J. Walsh. Transferring State Abstractions Between MDPs , 2006 .