论文信息 - Internal Rewards Mitigate Agent Boundedness

Internal Rewards Mitigate Agent Boundedness

Reinforcement learning (RL) research typically develops algorithms for helping an RL agent best achieve its goals—however they came to be defined—while ignoring the relationship of those goals to the goals of the agent designer. We extend agent design to include the meta-optimization problem of selecting internal agent goals (rewards) which optimize the designer's goals. Our claim is that well-designed internal rewards can help improve the performance of RL agents which are computationally bounded in some way (as practical agents are). We present a formal framework for understanding both bounded agents and the meta-optimization problem, and we empirically demonstrate several instances of common agent bounds being mitigated by general internal reward functions.

Richard L. Lewis | Satinder P. Singh | Jonathan Sorg | Satinder Singh | Jonathan Sorg

[1] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[3] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[4] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[5] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[6] Terrence J. Sejnowski,et al. Exploration Bonuses and Dual Control , 1996, Machine Learning.

[7] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[10] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .

[11] A. Barto,et al. On Separating Agent Designer Goals from Agent Goals : Breaking the Preferences – Parameters Confound , 2010 .