Global Versus Local Constructive Function Approximation for On-Line Reinforcement Learning

In order to scale to large state-spaces, reinforcement learning (RL) algorithms need to apply function approximation techniques. Research on function approximation for RL has so far focused either on global methods with a static structure or on constructive architectures using locally responsive units. The former, whilst achieving some notable successes, has also failed on some relatively simple tasks. The locally constructive approach is more stable, but may scale poorly to higher-dimensional inputs. This paper examines two globally constructive algorithms based on the Cascor supervised-learning algorithm. These algorithms are applied within the sarsa RL algorithm, and their performance compared against a multi-layer perceptron and a locally constructive algorithm (the Resource Allocating Network). It is shown that the globally constructive algorithms are less stable, but that on some tasks they achieve similar performance to the RAN, whilst generating more compact solutions.

[1]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[2]  Charles W. Anderson,et al.  Q-Learning with Hidden-Unit Restarting , 1992, NIPS.

[3]  Doina Precup,et al.  Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .

[4]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[5]  S. Waugh,et al.  Function evaluation and the cascade-correlation architecture , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[6]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[7]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[8]  J. Bagnell,et al.  Stabilizing Human Control Strategies through Reinforcement Learning , 1999 .

[9]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[10]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[11]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[12]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[13]  Peter Vamplew,et al.  On-Line Reinforcement Learning Using Cascade Constructive Neural Networks , 2005, KES.

[14]  Rémi Coulom Feedforward Neural Networks in Reinforcement Learning Applied to High-Dimensional Motor Control , 2002, ALT.

[15]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[16]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[17]  Jenq-Neng Hwang,et al.  The cascade-correlation learning: a projection pursuit learning perspective , 1996, IEEE Trans. Neural Networks.

[18]  Lutz Prechelt,et al.  Investigation of the CasCor Family of Learning Algorithms , 1997, Neural Networks.

[19]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[20]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[21]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[22]  Peter Vamplew,et al.  Lego Mindstorms Robots as a Platform for Teaching Reinforcement Learning , 2004 .

[23]  Doina Precup,et al.  Combining TD-learning with Cascade-correlation Networks , 2003, ICML.

[24]  Peter Vamplew,et al.  Adaptive Response Function Neurons , 2003 .

[25]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[26]  Jukka Saarinen,et al.  Evaluation of constructive neural networks with cascaded architectures , 2002, Neurocomputing.

[27]  Alan F. Murray,et al.  IEEE International Conference on Neural Networks , 1997 .