TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. We generalize Incremental Delta Bar Delta (IDBD)---a vectorized adaptive step-size method for supervised learning---to TD learning, which we name TIDBD. We demonstrate that TIDBD is able to find appropriate step-sizes in both stationary and non-stationary prediction tasks, outperforming ordinary TD methods and TD methods with scalar step-size adaptation; we demonstrate that it can differentiate between features which are relevant and irrelevant for a given task, performing representation learning; and we show on a real-world robot prediction task that TIDBD is able to outperform ordinary TD methods and TD methods augmented with AlphaBound and RMSprop.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[3]  Patrick M. Pilarski,et al.  True Online Temporal-Difference Learning , 2015, J. Mach. Learn. Res..

[4]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[5]  Andrew G. Barto,et al.  Adaptive Step-Size for Online Temporal Difference Learning , 2012, AAAI.

[6]  Shane Legg,et al.  Temporal Difference Updating without a Learning Rate , 2007, NIPS.

[7]  Patrick M. Pilarski,et al.  Tuning-free step-size adaptation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Craig Sherstan,et al.  Application of real-time machine learning to myoelectric prosthesis control: A case series in adaptive switching , 2016, Prosthetics and orthotics international.

[9]  Patrick M. Pilarski,et al.  A Collaborative Approach to the Simultaneous Multi-joint Control of a Prosthetic Arm , 2015, 2015 IEEE International Conference on Rehabilitation Robotics (ICORR).

[10]  Will Dabney,et al.  ADAPTIVE STEP-SIZES FOR REINFORCEMENT LEARNING , 2014 .

[11]  Etienne Barnard,et al.  Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..

[12]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.