论文信息 - Two Timescale Stochastic Approximation with Controlled Markov noise

Two Timescale Stochastic Approximation with Controlled Markov noise

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.

S. Bhatnagar | Prasenjit Karmakar

[1] Huizhen Yu,et al. Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize , 2015, J. Mach. Learn. Res..

[2] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..

[3] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.

[4] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .

[5] V. Tadić. Convergence and convergence rate of stochastic gradient search in the case of multiple and non-isolated extrema , 2009, 49th IEEE Conference on Decision and Control (CDC).

[6] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[7] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[8] Vivek S. Borkar,et al. Stochastic approximation with 'controlled Markov' noise , 2006, Systems & control letters (Print).

[9] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..

[10] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[11] V. Tadić. Almost sure convergence of two time-scale stochastic approximation algorithms , 2004, Proceedings of the 2004 American Control Conference.

[12] John N. Tsitsiklis,et al. Linear stochastic approximation driven by slowly varying Markov chains , 2003, Syst. Control. Lett..

[13] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.

[14] V. Borkar. Stochastic approximation with two time scales , 1997 .

[15] V. Borkar. Probability Theory: An Advanced Course , 1995 .

[16] A. Shwartz,et al. Stochastic approximations for finite-state Markov chains , 1990 .

[17] J. Aubin,et al. Differential inclusions set-valued maps and viability theory , 1984 .

[18] W. Rudin. Principles of mathematical analysis , 1964 .