Expressing Tasks Robustly via Multiple Discount Factors
暂无分享,去创建一个
[1] Shanefrederick,et al. Time Discounting and Time Preference : A Critical Review , 2022 .
[2] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.
[3] Evan Dekker,et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.
[4] Edmund H. Durfee,et al. Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors , 2005, IJCAI.
[5] Daniel Dewey,et al. Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.
[6] Sven Koenig,et al. The interaction of representations and planning objectives for decision-theoretic planning tasks , 2002, J. Exp. Theor. Artif. Intell..
[7] Peter Stone,et al. Learning non-myopically from human-generated reward , 2013, IUI '13.
[8] Ted O’Donoghue,et al. Doing It Now or Later , 1999 .
[9] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[10] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[11] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[12] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[13] Eugene A. Feinberg,et al. Constrained dynamic programming with two discount factors: applications and an algorithm , 1999, IEEE Trans. Autom. Control..
[14] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .