论文信息 - Reinforcement Learning for Continuous Stochastic Control Problems - 字舞流文

Reinforcement Learning for Continuous Stochastic Control Problems

This paper is concerned with the problem of Reinforcement Learning (RL) for continuous state space and time stochastic control problems. We state the Hamilton-Jacobi-Bellman equation satisfied by the value function and use a Finite-Difference method for designing a convergent approximation scheme. Then we propose a RL algorithm based on this scheme and prove its convergence to the optimal solution.

Rémi Munos | Paul Bourgine | R. Munos | P. Bourgine

[1] R. Galen,et al. Beyond Normality: The Predictive Value and E ciency of Medical Diagnoses , 1975 .

[2] N. Krylov. Controlled Diffusion Processes , 1980 .

[3] Philip E. Gill,et al. Practical optimization , 1981 .

[4] R. Bast,et al. A radioimmunoassay using a monoclonal antibody to monitor the course of epithelial ovarian cancer. , 1983, The New England journal of medicine.

[5] J. Hanley,et al. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[6] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[7] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[8] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[9] Roger Fletcher,et al. Practical methods of optimization; (2nd ed.) , 1987 .

[10] R. Fletcher. Practical Methods of Optimization , 1988 .

[11] J A Swets,et al. Measuring the accuracy of diagnostic systems. , 1988, Science.

[12] I. Jacobs,et al. A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer , 1990 .

[13] D. Oram,et al. A risk of malignancy index incorporating CA 125, ultrasound and menopausal status for the accurate preoperative diagnosis of ovarian cancer , 1991, British journal of obstetrics and gynaecology.

[14] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .

[15] M. James. Controlled markov processes and viscosity solutions , 1994 .

[16] B. Yegnanarayana,et al. Artificial neural networks for pattern recognition , 1994 .

[17] P. Kloeden,et al. Numerical Solutions of Stochastic Differential Equations , 1995 .

[18] Werner Römisch,et al. Numerical Solution of Stochastic Differential Equations (Peter E. Kloeden and Eckhard Platen) , 1995, SIAM Rev..

[19] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .

[20] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[21] A. Tailor,et al. Sonographic prediction of malignancy in adnexal masses using multivariate logistic regression analysis , 1997, Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology.

[22] Rémi Munos,et al. A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method , 1997, IJCAI.