论文信息 - Empirical Comparison of Gradient Descent and Exponentiated Gradient Descent in Supervised and Reinforcement Learning

Empirical Comparison of Gradient Descent and Exponentiated Gradient Descent in Supervised and Reinforcement Learning

This report describes a series of results using the exponentiated gradient descent (EG) method recently proposed by Kivinen and Warmuth. Prior work is extended by comparing speed of learning on a nonstationary problem and on an extension to backpropagation networks. Most significantly, we present an extension of the EG method to temporal-difference and reinforcement learning. This extension is compared to conventional reinforcement learning methods on two test problems using CMAC function approximators and replace traces. On the larger of the two problems, the average loss was approximately 25% smaller for the EG method. The relative computational complexity and parameter sensitivity of the two methods is also discussed.

R. Sutton | Doina Precup

[1] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.

[2] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[3] James P. Callan,et al. Training algorithms for linear text classifiers , 1996, SIGIR '96.

[4] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[5] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..