Temporal Difference Methods for General Projected Equations

We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.

[1]  B. Martinet,et al.  R'egularisation d''in'equations variationnelles par approximations successives , 1970 .

[2]  M. A. Krasnoselʹskii,et al.  Approximate Solution of Operator Equations , 1972 .

[3]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[4]  D. Bertsekas,et al.  Projection methods for variational inequalities with application to the traffic assignment problem , 1982 .

[5]  C. Fletcher Computational Galerkin Methods , 1983 .

[6]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  S. Ioffe,et al.  Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[11]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[12]  Dimitri P. Bertsekas,et al.  Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .

[13]  Guanrong Chen,et al.  Approximate Solutions of Operator Equations , 1997 .

[14]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[15]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[16]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[17]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[18]  Benjamin Van Roy,et al.  On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .

[19]  Dimitri P. Bertsekas,et al.  Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[20]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[21]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[22]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[23]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[24]  A. Barto,et al.  Improved Temporal Difference Methods with Linear Function Approximation , 2004 .

[25]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[26]  David Choi,et al.  A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..

[27]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[28]  D. Bertsekas,et al.  A Least Squares Q-Learning Algorithm for Optimal Stopping Problems , 2007 .

[29]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[30]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[31]  Dimitri P. Bertsekas,et al.  Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.

[32]  D. Bertsekas Projected Equations, Variational Inequalities, and Temporal Difference Methods , 2009 .

[33]  D. Bertsekas,et al.  Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .

[34]  Huizhen Yu,et al.  Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..

[35]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.