Reinforcement Learning with Kernel Recursive Least-Squares Support Vector Machine

A reinforcement learning system based on the kernel recursive least-squares algorithm for continuous state-space is proposed in this paper. A kernel recursive least-squares- support vector machine is used to realized a mapping from state-action pair to Q-value function. An online sparsification process that permits the addition of training sample into the Q-function approximation only if it is approximately linearly independent of the preceding training samples. Simulation result of two-link robot manipulator show that the proposed method has high learning efficiency - better accuracy measured in terms of mean square error, and lesser computation time compare to the least-squares support vector machine. I. I NTRODUCTION Support vector machine (SVM), which is based on Vapnik's structural risk minimization (SRM) (1), has become one of the most popular methods in solving classification and regression problems. Conventional SVMs have properties of global optimization, and good adaptability. However, the optimal solutions are obtained by solving standard quadratic programming which results in high computational cost. In order to reduce the computational cost, a least square support vector (LS-SVM) was proposed in (2) by converting inequality constraints to linear equations. LS-SVM has been successfully applied to reinforcement learning (RL) problems (3). A RL problem is converted into a regression problem, wherein the observed states and actions are considered as inputs and Q-value functions as output. All the training samples may be support vectors in LS-SVM, and thus the support vectors are no longer sparse. It may lead to poor generalization. In addition, with the number of the input training pairs increasing, the number of equations will increase which may result in higher computational cost. Our focus in this paper is on reinforcement learning problems. The objective in hand is to explore the use of a support vector machine with sparse support vectors, low computational cost and satisfactory accuracy. The kernel recursive least-squares (KRLS) - SVM (4), is a strong candidate to achieve this objective. We develop a KRLS-SVM algorithm for reinforcement learning and demonstrate its potential through case study - two-link robot manipulator. The mean square error accuracy, computational