Anderson Acceleration for Reinforcement Learning

Anderson acceleration is an old and simple method for accelerating the computation of a fixed point. However, as far as we know and quite surprisingly, it has never been applied to dynamic programming or reinforcement learning. In this paper, we explain briefly what Anderson acceleration is and how it can be applied to value iteration, this being supported by preliminary experiments showing a significant speed up of convergence, that we critically discuss. We also discuss how this idea could be applied more generally to (deep) reinforcement learning.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  Alexandre d'Aspremont,et al.  Regularized nonlinear acceleration , 2016, Mathematical Programming.

[3]  Homer F. Walker,et al.  Anderson Acceleration for Fixed-Point Iterations , 2011, SIAM J. Numer. Anal..

[4]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[5]  Andrew G. Barto,et al.  Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[6]  Yousef Saad,et al.  Two classes of multisecant methods for nonlinear acceleration , 2009, Numer. Linear Algebra Appl..

[7]  Donald G. M. Anderson Iterative Procedures for Nonlinear Integral Equations , 1965, JACM.

[8]  Matthieu Geist,et al.  Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Zhihua Zhang,et al.  Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks , 2018, ArXiv.

[11]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[12]  C. T. Kelley,et al.  Convergence Analysis for Anderson Acceleration , 2015, SIAM J. Numer. Anal..

[13]  Alexander Jung,et al.  A Fixed-Point of View on Gradient Methods for Big Data , 2017, Front. Appl. Math. Stat..

[14]  Ravi Varadhan,et al.  Damped Anderson Acceleration With Restarts and Monotonicity Control for Accelerating EM and EM-like Algorithms , 2018, Journal of Computational and Graphical Statistics.

[15]  K. I. M. McKinnon,et al.  On the Generation of Markov Decision Processes , 1995 .