KERNEL TEMPORAL DIFFERENCES FOR REINFORCEMENT LEARNING WITH APPLICATIONS TO BRAIN MACHINE INTERFACES

of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy KERNEL TEMPORAL DIFFERENCES FOR REINFORCEMENT LEARNING WITH APPLICATIONS TO BRAIN MACHINE INTERFACES By Jihye Bae August 2013 Chair: Jose C. Principe Major: Electrical and Computer Engineering Reinforcement learning brain machine interfaces (RLBMI) have been shown to be a promising avenue for practical implementations of BMIs. In the RLBMI, a computer agent and a user in the environment cooperate and learn co-adaptively. An essential component in the agent is the neural decoder which translates the neural states of the user into control actions for the external device in the environment. However, to realize the advantages of the RLBMI in practice, there are several challenges that need to be addressed. First, the neural decoder must be able to handle high dimensional neural states containing spatial-temporal information. Second, the mapping from neural states to actions must be flexible enough without making strong assumptions. Third, the computational complexity of the decoder should be reasonable such that real time implementations are feasible. Fourth, it should be robust in the presence of outliers or perturbations in the environment. We introduce algorithms that take into account these four issues. To efficiently handle the high dimensional state spaces, we adopt the temporal difference (TD) learning which allows the learning of the state value function using function approximation. For a flexible decoder, we propose the use of kernel base representations which provides nonlinear extensions of TD(λ) which we call kernel temporal difference (KTD)(λ). Two key advantages of KTD(λ) are its nonlinear functional approximation capabilities and convergence guarantees that gracefully emerge as

[1]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[2]  Terrence J. Sejnowski,et al.  TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[3]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[4]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[5]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[6]  Mohammad Ghavamzadeh,et al.  Bayesian actor-critic algorithms , 2007, ICML '07.

[7]  José Carlos Príncipe,et al.  Reinforcement learning via kernel temporal difference , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  J. C. Sanchez,et al.  Control of a center-out reaching task using a reinforcement learning Brain-Machine Interface , 2011, 2011 5th International IEEE/EMBS Conference on Neural Engineering.

[12]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[13]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[14]  Andrew W. Moore,et al.  Learning evaluation functions for global optimization , 1998 .

[15]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[16]  Weifeng Liu,et al.  An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters , 2009, IEEE Transactions on Neural Networks.

[17]  S. Haykin,et al.  Kernel Least‐Mean‐Square Algorithm , 2010 .

[18]  J. Príncipe,et al.  The Correntropy Mace Filter for Image Recognition , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[19]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[20]  Alborz Geramifard,et al.  iLSTD: Eligibility Traces and Convergence Analysis , 2006, NIPS.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  José Carlos Príncipe,et al.  Correntropy based Granger causality , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[24]  Shalabh Bhatnagar,et al.  Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[25]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[26]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[27]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[28]  Justin C. Sanchez,et al.  Brain-machine interface control of a robot arm using actor-critic rainforcement learning , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[29]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[30]  José Carlos Príncipe,et al.  2011 Ieee International Workshop on Machine Learning for Signal Processing Stochastic Kernel Temporal Difference for Reinforcement Learning , 2022 .

[31]  Alborz Geramifard,et al.  Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.

[32]  Jose C. Principe,et al.  Co-Adaptive Learning in Brain-Machine Interfaces , 2008 .

[33]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[34]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[35]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[36]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[37]  Richard S. Sutton,et al.  Open Theoretical Questions in Reinforcement Learning , 1999, EuroCOLT.

[38]  M A Lebedev,et al.  A comparison of optimal MIMO linear and nonlinear models for brain–machine interfaces , 2006, Journal of neural engineering.

[39]  Xin Xu,et al.  Kernel Least-Squares Temporal Difference Learning , 2006 .

[40]  Justin C. Sanchez,et al.  Integrating robotic action with biologic perception: a brain-machine symbiosis theory , 2010 .

[41]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .

[42]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[43]  Xin Wang,et al.  Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[44]  José Carlos Príncipe,et al.  A loss function for classification based on a robust similarity metric , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[45]  José Carlos Príncipe,et al.  Coadaptive Brain–Machine Interface via Reinforcement Learning , 2009, IEEE Transactions on Biomedical Engineering.

[46]  Richard A Andersen,et al.  Decoding Trajectories from Posterior Parietal Cortex Ensembles , 2008, The Journal of Neuroscience.

[47]  José Carlos Príncipe,et al.  Using Correntropy as a cost function in linear adaptive filters , 2009, 2009 International Joint Conference on Neural Networks.

[48]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[49]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[50]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[51]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[52]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[53]  José Carlos Príncipe,et al.  Correntropy as a Novel Measure for Nonlinearity Tests , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[54]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[55]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[56]  José Carlos Príncipe,et al.  A closed form recursive solution for Maximum Correntropy training , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[57]  Badong Chen,et al.  Kernel adaptive filtering with maximum correntropy criterion , 2011, The 2011 International Joint Conference on Neural Networks.

[58]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[59]  David Sussillo,et al.  A recurrent neural network for closed-loop intracortical brain–machine interface decoders , 2012, Journal of neural engineering.

[60]  José Carlos Príncipe,et al.  Generalized correlation function: definition, properties, and application to blind equalization , 2006, IEEE Transactions on Signal Processing.