A General Convergence Method for Reinforcement Learning in the Continuous Case

In this paper, we propose a general method for designing convergent Reinforcement Learning algorithms in the case of continuous state-space and time variables. The method is based on the discretization of the continuous process by convergent approximation schemes: the Hamilton-Jacobi-Bellman equation is replaced by a Dynamic Programming (DP) equation for some Markovian Decision Process (MDP). If the data of the MDP were known, we could compute the value of the DP equation by using some DP updating rules. However, in the Reinforcement Learning (RL) approach, the state dynamics as well as the reinforcement functions are a priori unknown, leading impossible to use DP rules.