论文信息 - Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

There is a long history of using meta learning as representation learning, specifically for determining the relevance of inputs. In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent---building on a variety of prior work in stochastic approximation, machine learning, and artificial neural networks. In particular, we focus on stochastic meta-descent introduced in the Incremental Delta-Bar-Delta (IDBD) algorithm for setting individual step sizes for each feature of a linear function approximator. Using IDBD, a feature with large or small step sizes will have a large or small impact on generalization from training examples. As a main contribution of this work, we extend IDBD to temporal-difference (TD) learning---a form of learning which is effective in sequential, non i.i.d. problems. We derive a variety of IDBD generalizations for TD learning, demonstrating that they are able to distinguish which features are relevant and which are not. We demonstrate that TD IDBD is effective at learning feature relevance in both an idealized gridworld and a real-world robotic prediction task.

Patrick M. Pilarski | Richard S. Sutton | Vivek Veeriah | Alexandra Kearney | Jaden B. Travnik

[1] Patrick M. Pilarski,et al. Adaptive artificial limbs: a real-time approach to prediction and anticipation , 2013, IEEE Robotics & Automation Magazine.

[2] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..

[3] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[4] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[5] Robert C. Wilson,et al. Inferring Relevance in a Changing World , 2012, Front. Hum. Neurosci..

[6] Will Dabney,et al. ADAPTIVE STEP-SIZES FOR REINFORCEMENT LEARNING , 2014 .

[7] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[8] Patrick M. Pilarski,et al. Tuning-free step-size adaptation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Andrew G. Barto,et al. Adaptive Step-Size for Online Temporal Difference Learning , 2012, AAAI.

[10] Patrick M. Pilarski,et al. Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding , 2017, 2017 International Conference on Rehabilitation Robotics (ICORR).

[11] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.

[12] M. R. Dawson,et al. DEVELOPMENT OF THE BENTO ARM : AN IMPROVED ROBOTIC ARM FOR MYOELECTRIC TRAINING AND RESEARCH , 2014 .

[13] R. S. Sutton,et al. Dynamic switching and real-time machine learning for improved human control of assistive biomedical robots , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).

[14] Linda B. Smith,et al. From the lexicon to expectations about kinds: a role for associative learning. , 2005, Psychological review.

[15] M. Arbib,et al. A model of cerebellar metaplasticity. , 1998, Learning & memory.

[16] Richard S. Sutton,et al. Representation Search through Generate and Test , 2013, AAAI Workshop: Learning Rich Representations from Low-Level Sensors.

[17] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[18] Matthew E. Taylor,et al. Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control , 2018, IJCAI.

[19] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[22] Patrick M. Pilarski,et al. Machine learning and unlearning to autonomously switch between the functions of a myoelectric arm , 2016, 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob).