Expressing Tasks Robustly via Multiple Discount Factors

Reward engineering is the problem of expressing a target task for an agent in the form of rewards for a Markov decision process. To be useful for learning, it is important that these encodings be robust to structural changes in the underlying domain; that is, the specification remain unchanged for any domain in some target class. We identify problems that are difficult to express robustly via the standard model of discounted rewards. In response, we examine the idea of decomposing a reward function into separate components, each with its own discount factor. We describe a method for finding robust parameters through the concept of task engineering, which additionally modifies the discount factors. We present a method for optimizing behavior in this setting and show that it could provide a more robust language than standard approaches.

[1]  Shanefrederick,et al.  Time Discounting and Time Preference : A Critical Review , 2022 .

[2]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[3]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[4]  Edmund H. Durfee,et al.  Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors , 2005, IJCAI.

[5]  Daniel Dewey,et al.  Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.

[6]  Sven Koenig,et al.  The interaction of representations and planning objectives for decision-theoretic planning tasks , 2002, J. Exp. Theor. Artif. Intell..

[7]  Peter Stone,et al.  Learning non-myopically from human-generated reward , 2013, IUI '13.

[8]  Ted O’Donoghue,et al.  Doing It Now or Later , 1999 .

[9]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[10]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[11]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[12]  Sridhar Mahadevan,et al.  To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.

[13]  Eugene A. Feinberg,et al.  Constrained dynamic programming with two discount factors: applications and an algorithm , 1999, IEEE Trans. Autom. Control..

[14]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .