A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

A simulation-based algorithm for learning good policies for a discrete-time stochastic control process with unknown transition law is analyzed when the state and action spaces are compact subsets of Euclidean spaces. This extends the Q-learning scheme of discrete state/action problems along the lines of Baker [4]. Almost sure convergence is proved under suitable conditions.

[1]  J. Neveu,et al.  Discrete Parameter Martingales , 1975 .

[2]  Lennart Ljung,et al.  Analysis of recursive stochastic algorithms , 1977 .

[3]  Dimitri P. Bertsekas Distributed Computation of Fixed Points. , 1981 .

[4]  Morris W. Hirsch,et al.  Convergent activation dynamics in continuous time networks , 1989, Neural Networks.

[5]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[6]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[7]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8]  V. Borkar Distributed computation of fixed points of ∞-nonexpansive maps , 1996 .

[9]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[10]  Vivek S. Borkar,et al.  Multiscale Stochastic Approximation for Parametric Optimization of Hidden Markov Models , 1997, Probability in the Engineering and Informational Sciences.

[11]  V. Borkar Stochastic approximation with two time scales , 1997 .

[12]  Vivek S. Borkar,et al.  Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..

[13]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[14]  Vivek S. Borkar,et al.  An analog scheme for fixed-point computation-Part II: Applications , 1999 .

[15]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[16]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[17]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..