论文信息 - Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for non-stationary environments. In general reinforcement learning situations, a stepsize parameter is decreased to zero during learning, because the environment is generally supposed to be noisy but stationary, such that the true expected rewards are fixed. On the other hand, we assume that in the real world, the true expected reward changes over time and hence, the learning agent must adapt the change through continuous learning. We derive the higher-order derivatives of exponential moving average (which is used to estimate the expected values of states or actions in major reinforcement learning methods) using stepsize parameters. We also illustrate a mechanism to calculate these derivatives in a recursive manner. Using the mechanism, we construct a precise and flexible adaptation method for the stepsize parameter in order to optimize a certain criterion, for example, to minimize square errors. The proposed method is validated both theoretically and experimentally.

Itsuki Noda

[1] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.

[2] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[3] Jason R. Marden,et al. Regret based dynamics: convergence in weakly acyclic games , 2007, AAMAS '07.

[4] Martin A. Riedmiller,et al. Speeding-up Reinforcement Learning with Multi-step Actions , 2002, ICANN.

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] Scott C. Douglas. Generalized gradient adaptive step sizes for stochastic gradient adaptive filters , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .

[8] Peter Stone,et al. IFSA: incremental feature-set augmentation for reinforcement learning tasks , 2007, AAMAS '07.

[9] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[10] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[11] Victor R. Lesser,et al. Learning the task allocation game , 2006, AAMAS '06.

[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[14] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..