Reinforcement Learning for Continuous Stochastic Control Problems

This paper is concerned with the problem of Reinforcement Learning (RL) for continuous state space and time stochastic control problems. We state the Hamilton-Jacobi-Bellman equation satisfied by the value function and use a Finite-Difference method for designing a convergent approximation scheme. Then we propose a RL algorithm based on this scheme and prove its convergence to the optimal solution.

[1]  R. Galen,et al.  Beyond Normality: The Predictive Value and E ciency of Medical Diagnoses , 1975 .

[2]  N. Krylov Controlled Diffusion Processes , 1980 .

[3]  Philip E. Gill,et al.  Practical optimization , 1981 .

[4]  R. Bast,et al.  A radioimmunoassay using a monoclonal antibody to monitor the course of epithelial ovarian cancer. , 1983, The New England journal of medicine.

[5]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[8]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[9]  Roger Fletcher,et al.  Practical methods of optimization; (2nd ed.) , 1987 .

[10]  R. Fletcher Practical Methods of Optimization , 1988 .

[11]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[12]  I. Jacobs,et al.  A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer , 1990 .

[13]  D. Oram,et al.  A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer , 1991, British journal of obstetrics and gynaecology.

[14]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[15]  M. James Controlled markov processes and viscosity solutions , 1994 .

[16]  B. Yegnanarayana,et al.  Artificial neural networks for pattern recognition , 1994 .

[17]  P. Kloeden,et al.  Numerical Solutions of Stochastic Differential Equations , 1995 .

[18]  Werner Römisch,et al.  Numerical Solution of Stochastic Differential Equations (Peter E. Kloeden and Eckhard Platen) , 1995, SIAM Rev..

[19]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[20]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[21]  A. Tailor,et al.  Sonographic prediction of malignancy in adnexal masses using multivariate logistic regression analysis , 1997, Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology.

[22]  Rémi Munos,et al.  A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method , 1997, IJCAI.