On-line Reinforcement Learning for Nonlinear Motion Control: Quadratic and Non-Quadratic Reward Functions
暂无分享,去创建一个
[1] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[2] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[3] Warren B. Powell,et al. An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times , 2002, Transp. Sci..
[4] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[5] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[8] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[9] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.
[10] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[11] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[12] Robert Babuska,et al. Efficient Model Learning Methods for Actor–Critic Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).