IMPROVED TEMPORAL DIFFERENCE METHODS WITH LINEAR FUNCTION APPROXIMATION1

We consider temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost, and linear cost function approximation. We show, under standard assumptions, that a least squares-based temporal difference method, proposed by Nedić and Bertsekas [NeB03], converges with a stepsize equal to 1. To our knowledge, this is the first iterative temporal difference method that converges without requiring a diminishing stepsize. We discuss the connections of the method with Sutton’s TD(λ) and with various versions of least squares-based value iteration, and we show via analysis and experiment that the method is substantially and often dramatically faster than TD(λ), as well as simpler and more reliable. We also discuss the relation of our method with the LSTD method of Boyan [Boy02], and Bradtke and Barto [BrB96]. 1 Research supported by NSF Grant ECS-0218328 and Grant III.5(157)/99-ET from the Dept. of Science and Technology, Government of India. Thanks are due to Janey Yu for her assistance with the computational experimentation. 2 Lab. for Information and Decision Systems, M.I.T., Cambridge, MA., 02139 3 School of Technology and Computer Science, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400005, India. 4 Alphatech, Inc., Burlington, MA. 1