Reinforcement learning via kernel temporal difference

This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

[1]  J. C. Sanchez,et al.  Control of a center-out reaching task using a reinforcement learning Brain-Machine Interface , 2011, 2011 5th International IEEE/EMBS Conference on Neural Engineering.

[2]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  David M. Santucci,et al.  Learning to Control a Brain–Machine Interface for Reaching and Grasping by Primates , 2003, PLoS biology.

[5]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[6]  Weifeng Liu,et al.  Kernel Adaptive Filtering , 2010 .

[7]  Justin C. Sanchez,et al.  Integrating robotic action with biologic perception: a brain-machine symbiosis theory , 2010 .

[8]  S. Haykin,et al.  Kernel Least‐Mean‐Square Algorithm , 2010 .

[9]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[10]  Miriam Zacksenhouse,et al.  Cortical Ensemble Adaptation to Represent Velocity of an Artificial Actuator Controlled by a Brain-Machine Interface , 2005, The Journal of Neuroscience.

[11]  Justin C. Sanchez FROM CORTICAL NEURAL SPIKE TRAINS TO BEHAVIOR: MODELING AND ANALYSIS , 2004 .

[12]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[13]  Nicholas G. Hatsopoulos,et al.  Brain-machine interface: Instant neural control of a movement signal , 2002, Nature.

[14]  Jon A. Mukand,et al.  Neuronal ensemble control of prosthetic devices by a human with tetraplegia , 2006, Nature.

[15]  R. Andersen,et al.  Cognitive Control Signals for Neural Prosthetics , 2004, Science.

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  D.M. Taylor,et al.  Information conveyed through brain-control: cursor versus robot , 2003, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[18]  José Carlos Príncipe,et al.  Coadaptive Brain–Machine Interface via Reinforcement Learning , 2009, IEEE Transactions on Biomedical Engineering.

[19]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[20]  Andrew S. Whitford,et al.  Cortical control of a prosthetic arm for self-feeding , 2008, Nature.

[21]  Jerald D. Kralik,et al.  Real-time prediction of hand trajectory by ensembles of cortical neurons in primates , 2000, Nature.