Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go?

In this paper we study study constant stepsize averaged linear stochastic approximation. With an eye towards linear value estimation in reinforcement learning, we ask whether for a given class of linear estimation problems i) a single universal constant stepsize with ii) a C/t worst-case expected error with a class-dependent constant C > 0 can be guaranteed when the error is measured via an appropriate weighted squared norm. Such a result has recently been obtained in the context of linear least squares regression. We give examples that show that the answer to these questions in general is no. On the positive side, we also characterize the instance dependent behavior of the error of the said algorithms, identify some conditions under which the answer to the above questions can be changed to the positive, and in particular show instance-dependent error bounds of magnitude O(1/t) for the constant stepsize iterate averaged versions of TD(0) and a novel variant of GTD, where the stepsize is chosen independently of the value estimation instance. Computer simulations are used to illustrate and complement the theory.

[1]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[2]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[3]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[4]  G. Pflug,et al.  Stochastic approximation and optimization of random systems , 1992 .

[5]  Xuan Kong,et al.  Adaptive Signal Processing Algorithms: Stability and Performance , 1994 .

[6]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[7]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[8]  John N. Tsitsiklis,et al.  Linear stochastic approximation driven by slowly varying Markov chains , 2003, Syst. Control. Lett..

[9]  Csaba Szepesvári,et al.  Performance of Nonlinear Approximate Adaptive Controllers , 2003 .

[10]  John N. Tsitsiklis,et al.  On Average Versus Discounted Reward Temporal-Difference Learning , 2002, Machine Learning.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Richard S. Sutton,et al.  A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.

[13]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[14]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[15]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[16]  Andrew G. Barto,et al.  Adaptive Step-Size for Online Temporal Difference Learning , 2012, AAAI.

[17]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[18]  Dimitri P. Bertsekas,et al.  Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems , 2014, Math. Oper. Res..

[19]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[20]  Ohad Shamir,et al.  The sample complexity of learning linear predictors with the squared loss , 2014, J. Mach. Learn. Res..

[21]  Marek Petrik,et al.  Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.

[22]  Shie Mannor,et al.  Concentration Bounds for Two Timescale Stochastic Approximation with Applications to Reinforcement Learning , 2017, ArXiv.

[23]  Francis R. Bach,et al.  Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..

[24]  A. Eigen-analysis Stochastic Variance Reduction Methods for Policy Evaluation , 2017 .