A basic formula for online policy gradient algorithms
暂无分享,去创建一个
[1] Y. Ho,et al. Perturbation analysis and optimization of queueing networks , 1982, 1982 21st IEEE Conference on Decision and Control.
[2] Xi-Ren Cao. Convergence of parameter sensitivity estimates in a stochastic experiment , 1984, The 23rd IEEE Conference on Decision and Control.
[3] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[4] Peter W. Glynn,et al. Optimization Of Stochastic Systems Via Simulation , 1989, 1989 Winter Simulation Conference Proceedings.
[5] P. Glynn. Optimization of stochastic systems via simulation , 1989, WSC '89.
[6] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..
[7] Xi-Ren Cao,et al. Perturbation analysis of discrete event dynamic systems , 1991 .
[8] Peter W. Glynn,et al. Gradient estimation for ratios , 1991, 1991 Winter Simulation Conference Proceedings..
[9] Xi-Ren Cao,et al. Realization Probabilities: The Dynamics of Queuing Systems , 1994 .
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[12] Xi-Ren Cao,et al. A single sample path-based performance sensitivity formula for Markov chains , 1996, IEEE Trans. Autom. Control..
[13] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[14] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[15] Christos G. Cassandras,et al. Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.
[16] Xi-Ren Cao,et al. A unified approach to Markov decision problems and performance sensitivity analysis , 2000, at - Automatisierungstechnik.
[17] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[18] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[19] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[20] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[21] Xi-Ren Cao,et al. From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[22] Xi-Ren Cao,et al. The potential structure of sample paths and performance sensitivities of Markov systems , 2004, IEEE Transactions on Automatic Control.
[23] L. Breuer. Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.
[24] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .