Continuous state/action reinforcement learning: A growing self-organizing map approach

This paper proposes an algorithm to deal with continuous state/action space in the reinforcement learning (RL) problem. Extensive studies have been done to solve the continuous state RL problems, but more research should be carried out for RL problems with continuous action spaces. Due to non-stationary, very large size, and continuous nature of RL problems, the proposed algorithm uses two growing self-organizing maps (GSOM) to elegantly approximate the state/action space through addition and deletion of neurons. It has been demonstrated that GSOM has a better performance in topology preservation, quantization error reduction, and non-stationary distribution approximation than the standard SOM. The novel algorithm proposed in this paper attempts to simultaneously find the best representation for the state space, accurate estimation of Q-values, and appropriate representation for highly rewarded regions in the action space. Experimental results on delayed reward, non-stationary, and large-scale problems demonstrate very satisfactory performance of the proposed algorithm.

[1]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[6]  B Fritzke,et al.  A growing neural gas network learns topologies. G. Tesauro, DS Touretzky, and TK Leen, editors , 1995, NIPS 1995.

[7]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[8]  Charles W. Anderson,et al.  Q-Learning with Hidden-Unit Restarting , 1992, NIPS.

[9]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[10]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[11]  Michail G. Lagoudakis,et al.  Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[12]  Young-June Choi,et al.  Flexible Design of Frequency Reuse Factor in OFDMA Cellular Networks , 2006, 2006 IEEE International Conference on Communications.

[13]  Charles W. Anderson,et al.  Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[14]  Andrew James Smith,et al.  Applications of the self-organising map to reinforcement learning , 2002, Neural Networks.

[15]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[16]  Andrew James Smith,et al.  Dynamic generalisation of continuous action spaces in reinforcement learning : a neurally inspired approach , 2002 .

[17]  Michael L. Littman,et al.  Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.

[18]  Andrea Bonarini,et al.  Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.

[19]  Ron Sun,et al.  From implicit skills to explicit knowledge: a bottom-up model of skill learning , 2001, Cogn. Sci..

[20]  Bernd Fritzke Growing self-organizing networks - Why ? , 1996, ESANN.

[21]  Baher Abdulhai,et al.  Reinforcement learning for true adaptive traffic signal control , 2003 .

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[24]  Dong Ku Kim,et al.  Downlink Packet Scheduling with Minimum Throughput Guarantee in TDD-OFDMA Cellular Network , 2005, NETWORKING.

[25]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[26]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[27]  Richard S. Sutton,et al.  A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.

[28]  André da Motta Salles Barreto,et al.  Restricted gradient-descent algorithm for value-function approximation in reinforcement learning , 2008, Artif. Intell..

[29]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[30]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[31]  José del R. Millán,et al.  Continuous-Action Q-Learning , 2002, Machine Learning.

[32]  V. Roman,et al.  Broadband wireless access solutions based on OFDM access in IEEE 802.16 , 2002 .

[33]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[34]  Tetsuhiro Miyahara,et al.  Fuzzy Q-learning with the modified fuzzy ART neural network , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[35]  C. Touzet,et al.  Self-organizing map for reinforcement learning: obstacle-avoidance with Khepera , 1994, Proceedings of PerAc '94. From Perception to Action.

[36]  Shohei Kato,et al.  A Dynamic Allocation Method of Basis Functions in Reinforcement Learning , 2004, Australian Conference on Artificial Intelligence.

[37]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[38]  Youngnam Han,et al.  Optimal subchannel allocation scheme in multicell OFDMA systems , 2004, 2004 IEEE 59th Vehicular Technology Conference. VTC 2004-Spring (IEEE Cat. No.04CH37514).