An effective asynchronous framework for small scale reinforcement learning problems

Reinforcement learning is one of the research hotspots in the field of artificial intelligence in recent years. In the past few years, deep reinforcement learning has been widely used to solve various decision-making problems. However, due to the characteristics of neural networks, it is very easy to fall into local minima when facing small scale discrete space path planning problems. Traditional reinforcement learning uses continuous updating of a single agent when algorithm executes, which leads to a slow convergence speed. Although some scholars have done some improvement work to solve these problems, there are still many shortcomings to be overcome. In order to solve the above problems, we proposed a new asynchronous tabular reinforcement learning algorithms framework in this paper, and present four new variants of asynchronous reinforcement learning algorithms. We apply these algorithms on the standard reinforcement learning environments: frozen lake problem, cliff walking problem and windy gridworld problem, and the simulation results show that these methods can solve discrete space path planning problems efficiently and well balance the exploration and exploitation.

[1]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[2]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[3]  Tzuu-Hseng S. Li,et al.  Backward Q-learning: The combination of Sarsa algorithm and Q-learning , 2013, Eng. Appl. Artif. Intell..

[4]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[5]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[6]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[7]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[8]  Hongyu Guo,et al.  Generating Text with Deep Reinforcement Learning , 2015, ArXiv.

[9]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[10]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Günther Palm,et al.  Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.

[12]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[15]  Weikuan Jia,et al.  Asynchronous reinforcement learning algorithms for solving discrete space path planning problems , 2018, Applied Intelligence.

[16]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[17]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[18]  Weikuan Jia,et al.  Applications of asynchronous deep reinforcement learning based on dynamic updating weights , 2018, Applied Intelligence.

[19]  Shifei Ding,et al.  A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms , 2017 .

[20]  Hamido Fujita,et al.  Computer Aided detection for fibrillations and flutters using deep convolutional neural network , 2019, Inf. Sci..

[21]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[22]  Hamido Fujita,et al.  Multi-Imbalance: An open-source software for multi-class imbalance learning , 2019, Knowl. Based Syst..

[23]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[24]  Hamido Fujita,et al.  Multi-view manifold regularized learning-based method for prioritizing candidate disease miRNAs , 2019, Knowl. Based Syst..

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[27]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..