论文信息 - A General Convergence Method for Reinforcement Learning in the Continuous Case

A General Convergence Method for Reinforcement Learning in the Continuous Case

In this paper, we propose a general method for designing convergent Reinforcement Learning algorithms in the case of continuous state-space and time variables. The method is based on the discretization of the continuous process by convergent approximation schemes: the Hamilton-Jacobi-Bellman equation is replaced by a Dynamic Programming (DP) equation for some Markovian Decision Process (MDP). If the data of the MDP were known, we could compute the value of the DP equation by using some DP updating rules. However, in the Reinforcement Learning (RL) approach, the state dynamics as well as the reinforcement functions are a priori unknown, leading impossible to use DP rules.

Rémi Munos | R. Munos

[1] G. Barles,et al. Convergence of approximation schemes for fully nonlinear second order equations , 1990, 29th IEEE Conference on Decision and Control.

[2] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .

[3] G. Barles. Solutions de viscosité des équations de Hamilton-Jacobi , 1994 .

[4] Tyrone E. Duncan,et al. Numerical Methods for Stochastic Control Problems in Continuous Time (Harold J. Kushner and Paul G. Dupuis) , 1994, SIAM Rev..

[5] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[6] Rémi Munos,et al. A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning , 1996, ICML.

[7] Stanley J. Rosenschein,et al. Learning to act using real-time dynamic programming , 1996 .

[8] Rémi Munos,et al. Reinforcement Learning for Continuous Stochastic Control Problems , 1997, NIPS.

[9] Rémi Munos,et al. A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method , 1997, IJCAI.

[10] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .