Reward Design via Online Gradient Ascent
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[3] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[4] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[5] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.
[6] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] P. Bartlett,et al. Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).
[9] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[10] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.
[11] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[12] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[13] Henrik I. Christensen,et al. Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning , 2008, Adapt. Behav..
[14] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[15] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.
[16] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[17] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[18] Richard L. Lewis,et al. Internal Rewards Mitigate Agent Boundedness , 2010, ICML.
[19] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[20] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[21] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[22] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[23] Çetin Meriçli,et al. General Terms Algorithms , 2022 .
[24] Lee Spector,et al. Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.
[25] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .