A Novel Update Mechanism for Q-Networks Based On Extreme Learning Machines

Reinforcement learning is a popular machine learning paradigm which can find near optimal solutions to complex problems. Most often, these procedures involve function approximation using neural networks with gradient based updates to optimise weights for the problem being considered. While this common approach generally works well, there are other update mechanisms which are largely unexplored in reinforcement learning. One such mechanism is Extreme Learning Machines. These were initially proposed to drastically improve the training speed of neural networks and have since seen many applications. Here we attempt to apply extreme learning machines to a reinforcement learning problem in the same manner as gradient based updates. This new algorithm is called Extreme Q-Learning Machine (EQLM). We compare its performance to a typical Q-Network on the cart-pole task - a benchmark reinforcement learning problem - and show EQLM has similar long-term learning performance to a Q-Network.

[1]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[2]  José Manuel López-Guede,et al.  A L-MCRS dynamics approximation by ELM for Reinforcement Learning , 2015, Neurocomputing.

[3]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[5]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[8]  John J. Grefenstette,et al.  Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[9]  Qinghua Zheng,et al.  Regularized Extreme Learning Machine , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[10]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[12]  Donald Michie,et al.  BOXES: AN EXPERIMENT IN ADAPTIVE CONTROL , 2013 .

[13]  Manuel Graña,et al.  STATE-ACTION VALUE FUNCTION MODELED BY ELM IN REINFORCEMENT LEARNING FOR HOSE CONTROL PROBLEMS , 2013 .

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  B J Andrews,et al.  Computer simulation of FES standing up in paraplegia: a self-adaptive fuzzy controller with reinforcement learning. , 1998, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[16]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[17]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Guang-Bin Huang,et al.  Extreme Learning Machine for Multilayer Perceptron , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Qinyu. Zhu Extreme Learning Machine , 2013 .

[23]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Bo He,et al.  Target following for an autonomous underwater vehicle using regularized ELM-based reinforcement learning , 2015, OCEANS 2015 - MTS/IEEE Washington.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[27]  Min Liu,et al.  An incremental extreme learning machine for online sequential learning problems , 2014, Neurocomputing.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.