Receding Horizon Cache and Extreme Learning Machine based Reinforcement Learning

Function approximators have been extensively used in Reinforcement Learning (RL) to deal with large or continuous space problems. However, batch learning Neural Networks (NN), one of the most common approximators, has been rarely applied to RL. In this paper, possible reasons for this are laid out and a solution is proposed. Specifically, a Receding Horizon Cache (RHC) structure is designed to collect training data for NN by dynamically archiving state-action pairs and actively updating their Q-values, which makes batch learning NN much easier to implement. Together with Extreme Learning Machine (ELM), a new RL with function approximation algorithm termed as RHC and ELM based RL (RHC-ELM-RL) is proposed. A mountain car task was carried out to test RHC-ELM-RL and compare its performance with other algorithms.

[1]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Nate Kohl,et al.  Reinforcement Learning Benchmarks and Bake-offs II A workshop at the 2005 NIPS conference , 2005 .

[3]  Jingyuan Zhang,et al.  Application of Artificial Neural Network Based on Q-learning for Mobile Robot Path Planning , 2006, 2006 IEEE International Conference on Information Acquisition.

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Timo Similä,et al.  Multiresponse Sparse Regression with Application to Multidimensional Scaling , 2005, ICANN.

[6]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[7]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[8]  Yuan Lan,et al.  Constructive hidden nodes selection of extreme learning machine for regression , 2010, Neurocomputing.

[9]  Wen Lik Dennis Lui,et al.  Utilization of Webots and Khepera II as a platform for Neural Q-Learning controllers , 2009, 2009 IEEE Symposium on Industrial Electronics & Applications.

[10]  Meng Joo Er,et al.  Dynamic fuzzy neural networks-a novel approach to function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[13]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[14]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[15]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[16]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[17]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[18]  Meng Joo Er,et al.  Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning , 2004, IEEE Trans. Syst. Man Cybern. Part B.