论文信息 - On-line Reinforcement Learning for Nonlinear Motion Control: Quadratic and Non-Quadratic Reward Functions

On-line Reinforcement Learning for Nonlinear Motion Control: Quadratic and Non-Quadratic Reward Functions

Abstract Reinforcement learning (RL) is an active research area with applications in many fields. RL can be used to learn control strategies for nonlinear dynamic systems, without a mathematical model of the system being required. An essential element in RL is the reward function, which shows resemblance to the cost function in optimal control. Analogous to linear quadratic (LQ) control, a quadratic reward function has been applied in RL. However, there is no analysis or motivation in the literature, other than the parallel to LQ control. This paper shows that the use of a quadratic reward function in on-line RL may lead to counter-intuitive results in terms of a large steady-state error. Although the RL controller learns well, the final performance is not acceptable from a control-theoretic point of view. The reasons for this discrepancy are analyzed and the results are compared with non-quadratic functions (absolute value and square root) using a model learning actor-critic with local linear regression. One of the conclusions is that the absolute-value reward function reduces the steady-state error considerably, while the learning time is slightly longer than with the quadratic reward.

Robert Babuška | Jannis Engel

[1] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[2] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[3] Warren B. Powell,et al. An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times , 2002, Transp. Sci..

[4] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[5] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[8] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[9] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[10] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .

[11] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[12] Robert Babuska,et al. Efficient Model Learning Methods for Actor–Critic Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).