Reinforcement Learning and the Reward Engineering Principle

AI agents are becoming significantly more general and autonomous. We argue for the “Reward Engineering Principle”: as reinforcement-learning-based AI systems, become more general and autonomous, the design of reward mechanisms that elicit desired behaviours becomes both more important and more difficult. While early AI research could ignore reward design and focus solely on the problems of efficient, flexible, and effective achievement of arbitrary goals in varied environments, the reward engineering principle will affect modern AI research, both theoretical and applied, in the medium and long terms. We introduce some notation and derive preliminary results that formalize the intuitive landmarks of the area of reward design.