论文信息 - Comparing Value-Function Estimation Algorithms in Undiscounted Problems

Comparing Value-Function Estimation Algorithms in Undiscounted Problems

We compare scaling properties of several value-function estimation algorithms. In particular, we prove that Q-learning can scale exponentially slowly with the number of states. We identify the reasons of the slow convergence and show that both TD( ) and learning with a xed learning-rate enjoy rather fast convergence, just like the model-based method.

Ferenc Beleznay | F. Beleznay

[1] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.

[2] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.

[3] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[4] Sven Koenig,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1992, AAAI.

[5] Reid G. Simmons,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[6] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[7] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[8] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[9] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.