论文信息 - Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.

[1] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[2] Luís B. Almeida,et al. Acceleration Techniques for the Backpropagation Algorithm , 1990, EURASIP Workshop.

[3] Tom Tollenaere,et al. SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[4] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[5] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[6] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[7] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[8] Thibault Langlois,et al. Parameter adaptation in stochastic optimization , 1999 .

[9] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[10] Andreas Griewank,et al. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[11] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[12] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[13] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[14] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[15] Nicol N. Schraudolph,et al. Combining Conjugate Direction Methods with Stochastic Approximation of Gradients , 2003, AISTATS.

[16] R. Sutton. Gain Adaptation Beats Least Squares , 2006 .