Discrete space reinforcement learning algorithm based on support vector machine classification

Abstract When facing discrete space learning problems, the traditional reinforcement learning algorithms often have the problems of slow convergence and poor convergence accuracy. Deep reinforcement learning needs a large number of learning samples in its learning process, so it often faces with the problems that the algorithm is difficult to converge and easy to fall into local minimums. In view of the above problems, we apply support vector machines classification to reinforcement learning, and propose an algorithm named Advantage Actor-Critic with Support Vector Machine Classification (SVM-A2C). Our algorithm adopts the actor-critic framework and uses the support vector machine classification as a result of the actor's action output, while Critic uses the advantage function to improve and optimize the parameters of support vector machine. In addition, since the environment is changing all the time in reinforcement learning, it is difficult to find a global optimal solution for the support vector machines, the gradient descent method is applied to optimize the parameters of support vector machine. So that the agent can quickly learn a more precise action selection policy. Finally, the effectiveness of the proposed method is proved by the classical experimental environment of reinforcement learning. It is proved that the algorithm proposed in this paper has shorter episodes to convergence and more accurate results than other algorithms.

[1]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Shifei Ding,et al.  Twin support vector machines based on fruit fly optimization algorithm , 2016, Int. J. Mach. Learn. Cybern..

[5]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Etienne Wenger,et al.  Artificial Intelligence and Tutoring Systems , 1987 .

[8]  Hong Pan,et al.  Efficient and accurate face detection using heterogeneous feature descriptors and feature selection , 2013, Comput. Vis. Image Underst..

[9]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Shifei Ding,et al.  An overview on twin support vector machines , 2012, Artificial Intelligence Review.

[12]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Yu Xue,et al.  Wavelet twin support vector machines based on glowworm swarm optimization , 2017, Neurocomputing.

[17]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[18]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[19]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[20]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[21]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[22]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.