Temporal differences learning with the conjugate gradient algorithm

This paper investigates the use of the conjugate gradient (CG) algorithm in comparison to the traditional backpropagation (BP) algorithm, applying to the temporal difference (TD) method for reinforcement learning. Time series prediction is the application domain examined. Simple time series as well as more complex ones, coming from real data (stock market indices), are used as benchmark problems. The performance measures used are the learning speed, the generalization ability, and the sensitivity on user-set parameters. Preliminary experimental results suggest that the performance of TD learning can be significantly improved when the CG algorithm is employed, as compared to the traditional BP algorithm. In addition, as expected, the CG algorithm has been proved to be more robust and less dependent on user-set training parameters and initial conditions, especially for rather complicated time series. The use of the CG algorithm in TD learning is therefore promising for real-life applications in time series prediction.

[1]  C. Charalambous,et al.  Conjugate gradient algorithm for efficient training of artifi-cial neural networks , 1990 .

[2]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[3]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[4]  Farid U. Dowla,et al.  Backpropagation Learning for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method , 1991, Int. J. Neural Syst..

[5]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[6]  Andreas Stafylopatis,et al.  The impact of the error function selection in neural network-based classifiers , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[7]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[8]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[9]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[10]  Etienne Barnard,et al.  Optimization for training neural nets , 1992, IEEE Trans. Neural Networks.

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  C. Pellegrini,et al.  Using Global Line Searches for Nding Global Minima of Mlp Error Functions , 1997 .

[13]  J. Peng,et al.  Reinforcement learning algorithms as function optimizers , 1989, International 1989 Joint Conference on Neural Networks.