Q-learning System Based on Cooperative Least Squares Support Vector Machine

2 Abstract In order to solve the problem of slow convergence speed in reinforcement learning systems, a Q learning system based on a cooperative least squares support vector machine for continuous state space and discrete action space is pro- posed. The proposed Q learning system is composed of a least squares support vector regression machine (LS-SVRM) and a least squares support vector classiflcation machine (LS-SVCM). The LS-SVRM is used to approximate a mapping from a state- action pair to a value function, and the LS-SVCM is used to ap- proximate a mapping from a continuous state space to a discrete action space. In addition, the LS-SVCM supplies the LS-SVRM with dynamic and real-time knowledge or advice (suggested ac- tion) to accelerate its learning process. Simulation studies in- volving a mountain car control illustrate that compared with a Q learning system based on a single LS-SVRM, the proposed Q learning system has a faster convergence speed and a better learning performance.

[1]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[2]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[3]  Chen Shi,et al.  Research on Reinforcement Learning Technology: A Review , 2004 .

[4]  Xi Li-feng Pattern driven scheduling system based on reinforcement learning , 2007 .

[5]  Jude W. Shavlik,et al.  Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.

[6]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[7]  Richard Alan Peters,et al.  Reinforcement Learning with a Supervisor for a Mobile Robot in a Real-world Environment , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[8]  Cao Wei A New Q Learning Algorithm for Multi-agent Systems , 2007 .

[9]  Hiroshi Matsuo,et al.  State generalization method with support vector machines in reinforcement learning , 2006, Systems and Computers in Japan.

[10]  Jude W. Shavlik,et al.  A Simple and Effective Method for Incorporating Advice into Kernel Methods , 2006, AAAI.

[11]  Fernando Tadeo,et al.  Model-free learning control of neutralization processes using reinforcement learning , 2007, Eng. Appl. Artif. Intell..

[12]  Wang Dong-li Elevator Group Control Using Reinforcement Learning with CMAC , 2007 .

[13]  Kyriakos Mouratidis,et al.  Continuous Nearest Neighbor Queries over Sliding Windows , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Kyriakos Mouratidis,et al.  Continuous Nearest Neighbor Queries over Sliding Windows , 2007 .

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Jude W. Shavlik,et al.  Knowledge-Based Kernel Approximation , 2004, J. Mach. Learn. Res..

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Xuesong Wang,et al.  A fuzzy Actor-Critic reinforcement learning network , 2007, Inf. Sci..

[19]  Xuesong Wang,et al.  Value Approximation with Least Squares Support Vector Machine in Reinforcement Learning System , 2007 .