Recursive Least Squares Policy Control with Echo State Network

The echo state network (ESN) is a special type of recurrent neural networks for processing the time-series dataset. However, limited by the strong correlation among sequential samples of the agent, ESN-based policy control algorithms are difficult to use the recursive least squares (RLS) algorithm to update the ESN's parameters. To solve this problem, we propose two novel policy control algorithms, ESNRLS-Q and ESNRLS-Sarsa. Firstly, to reduce the correlation of training samples, we use the leaky integrator ESN and the mini-batch learning mode. Secondly, to make RLS suitable for training ESN in mini-batch mode, we present a new mean-approximation method for updating the RLS autocorrelation matrix. Thirdly, to prevent ESN from over-fitting, we use the L1 regularization technique. Lastly, to prevent the target state-action value from overestimation, we employ the Mellowmax method. Simulation results show that our algorithms have good convergence.

[1]  Derong Liu,et al.  Echo state network-based Q-learning method for optimal battery control of offices combined with renewable energy , 2017 .

[2]  Hanten Chang,et al.  Reinforcement learning with convolutional reservoir computing , 2019, Applied Intelligence.

[3]  Bart De Schutter,et al.  Online least-squares policy iteration for reinforcement learning control , 2010, Proceedings of the 2010 American Control Conference.

[4]  Zhou Xi Batch Least-squares Policy Iteration , 2014 .

[5]  András Lörincz,et al.  Reinforcement Learning with Echo State Networks , 2006, ICANN.

[6]  Pericles A. Mitkas,et al.  A NEAT Way for Evolving Echo State Networks , 2010, ECAI.

[7]  I. Engedy,et al.  Optimal control with reinforcement learning using reservoir computing and Gaussian Mixture , 2012, 2012 IEEE International Instrumentation and Measurement Technology Conference Proceedings.

[8]  Haitao Wang,et al.  Deep reinforcement learning with experience replay based on SARSA , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[9]  Xin Xu,et al.  Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.

[10]  L. Darrell Whitley,et al.  Optimal Neuron Selection: NK Echo State Networks for Reinforcement Learning , 2015, ArXiv.

[11]  Kavosh Asadi,et al.  DeepMellow: Removing the Need for a Target Network in Deep Q-Learning , 2019, IJCAI.

[12]  Michail G. Lagoudakis,et al.  Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.

[13]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Herbert Jaeger,et al.  Optimization and applications of echo state networks with leaky- integrator neurons , 2007, Neural Networks.

[16]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[17]  J. Sherman,et al.  Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Ender M. Eksioglu,et al.  RLS adaptive filtering with sparsity regularization , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).