Active Inverse Reward Design

Reward design, the problem of selecting an appropriate reward function for an AI system, is both critically important, as it encodes the task the system should perform, and challenging, as it requires reasoning about and understanding the agent's environment in detail. As a result, system designers often iterate on the reward function in a trial-and-error process to get their desired behavior. We propose structuring this process as a series of reward design queries, where we actively select the set of reward functions available to the designer. We query with two types of sets: discrete queries, where the system designer chooses from a small set of reward functions, and feature queries, where the system queries the designer for weights on a small subset of features. After each query, we use inverse reward design (IRD) (Hadfield-Menell et al., 2017) to update the distribution over the true reward function from the observed proxy reward function chosen by the designer. Compared to vanilla IRD, we find that our approach not only decreases the uncertainty about the true reward, but also greatly improves performance in unseen environments while only querying for reward functions in a single training environment.

[1]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[2]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[3]  Ashish Kapoor,et al.  On Greedy Maximization of Entropy , 2015, ICML.

[4]  Mukesh Singhal,et al.  Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries , 2018, HRI.

[5]  Fabio Viola,et al.  Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.

[6]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[7]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[8]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[9]  Johannes Fürnkranz,et al.  A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..

[10]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[11]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[12]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[13]  Johannes Fürnkranz,et al.  Model-Free Preference-Based Reinforcement Learning , 2016, AAAI.

[14]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[15]  Simon M. Lucas,et al.  Preference Learning for Move Prediction and Evaluation Function Approximation in Othello , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[16]  Eyke Hüllermeier,et al.  Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..

[17]  Michèle Sebag,et al.  APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[18]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[19]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[20]  Yuchen Cui,et al.  Active Reward Learning from Critiques , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[22]  Peter Stone,et al.  Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[23]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[24]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[25]  Rémi Munos,et al.  Learning to Search with MCTSnets , 2018, ICML.

[26]  Allan Jabri,et al.  Universal Planning Networks , 2018, ICML.

[27]  Nan Jiang,et al.  Repeated Inverse Reinforcement Learning , 2017, NIPS.

[28]  Eyke Hüllermeier,et al.  Preference Learning , 2005, Künstliche Intell..

[29]  Eyke Hllermeier,et al.  Preference Learning , 2010 .

[30]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[31]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .