论文信息 - On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online-learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions and uniform ergodicity of the Markov chains. Our main contribution is an exponential stability result for the p-th moment of random matrix product, provided that (i) the underlying Markov chain satisfies a super-Lyapunov drift condition, (ii) the growth of the matrix-valued functions is controlled by an appropriately defined function (related to the drift condition). Using this result, we give finite-time p-th moment bounds for constant and decreasing stepsize linear stochastic approximation schemes with Markovian noise on general state space. We illustrate these findings for linear value-function estimation in reinforcement learning. We provide finite-time p-th moment bound for various members of temporal difference (TD) family of algorithms.

[1] Yingbin Liang,et al. Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples , 2019, NeurIPS.

[2] Hoi-To Wai,et al. Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise , 2020, COLT.

[3] Lei Guo. Stability of recursive stochastic tracking algorithms , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[4] S. Meyn,et al. Large Deviations Asymptotics and the Spectral Theory of Multiplicatively Regular Markov Processes , 2005, math/0509310.

[5] R. Srikant,et al. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning , 2019, NeurIPS.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] S. Meyn,et al. Spectral theory and limit theorems for geometrically ergodic Markov processes , 2002, math/0209200.

[8] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.

[9] Gersende Fort,et al. Convergence of the Monte Carlo expectation maximization for curved exponential families , 2003 .

[10] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.

[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[12] L. Ljung,et al. Exponential stability of general tracking algorithms , 1995, IEEE Trans. Autom. Control..

[13] Thinh T. Doan,et al. Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation , 2019, SIAM J. Control. Optim..

[14] Lennart Ljung,et al. Performance analysis of general tracking algorithms , 1995 .

[15] E. Eweda,et al. Quadratic mean and almost-sure convergence of unbounded stochastic approximation algorithms with correlated observations , 1983 .

[16] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[17] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.

[18] Emmanuel Rio,et al. Asymptotic Theory of Weakly Dependent Random Processes , 2017 .

[19] On Small Perturbations of Stable Markov Operators: Unbounded Case , 1999 .

[20] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[21] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[22] Ana Busic,et al. Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation , 2020, AISTATS.

[23] Francis Comets,et al. Large Deviations and Applications , 2011, International Encyclopedia of Statistical Science.

[24] Csaba Szepesvári,et al. Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? , 2018, AISTATS.