论文信息 - IMPROVED TEMPORAL DIFFERENCE METHODS WITH LINEAR FUNCTION APPROXIMATION1

IMPROVED TEMPORAL DIFFERENCE METHODS WITH LINEAR FUNCTION APPROXIMATION1

We consider temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost, and linear cost function approximation. We show, under standard assumptions, that a least squares-based temporal difference method, proposed by Nedić and Bertsekas [NeB03], converges with a stepsize equal to 1. To our knowledge, this is the first iterative temporal difference method that converges without requiring a diminishing stepsize. We discuss the connections of the method with Sutton’s TD(λ) and with various versions of least squares-based value iteration, and we show via analysis and experiment that the method is substantially and often dramatically faster than TD(λ), as well as simpler and more reliable. We also discuss the relation of our method with the LSTD method of Boyan [Boy02], and Bradtke and Barto [BrB96]. 1 Research supported by NSF Grant ECS-0218328 and Grant III.5(157)/99-ET from the Dept. of Science and Technology, Government of India. Thanks are due to Janey Yu for her assistance with the computational experimentation. 2 Lab. for Information and Decision Systems, M.I.T., Cambridge, MA., 02139 3 School of Technology and Computer Science, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400005, India. 4 Alphatech, Inc., Burlington, MA. 1

D. Bertsekas | V. Borkar | A. Nedić

[1] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[5] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[7] On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning , 2000 .

[8] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[9] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.

[10] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[11] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[12] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.