A basic formula for performance gradient estimation of semi-Markov decision processes
暂无分享,去创建一个
[1] Arnaud Doucet,et al. A policy gradient method for semi-Markov decision processes with application to call admission control , 2007, Eur. J. Oper. Res..
[2] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[4] Xi-Ren Cao,et al. On-Line Policy Gradient Estimation with Multi-Step Sampling , 2010, Discret. Event Dyn. Syst..
[5] Xi-Ren Cao,et al. Perturbation analysis of discrete event dynamic systems , 1991 .
[6] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[7] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[8] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[11] Xi-Ren Cao,et al. Perturbation analysis and optimization of queueing networks , 1983 .
[12] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[13] Xi-Ren Cao,et al. A single sample path-based performance sensitivity formula for Markov chains , 1996, IEEE Trans. Autom. Control..
[14] P. Glynn,et al. Likelihood ratio gradient estimation for stochastic recursions , 1995, Advances in Applied Probability.
[15] Xi-Ren Cao,et al. A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.
[16] Javier A. Barria,et al. Reinforcement Learning for Resource Allocation in LEO Satellite Networks , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[17] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[18] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..
[19] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[20] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[21] Xi-Ren Cao,et al. Semi-Markov decision problems and performance sensitivity analysis , 2003, IEEE Trans. Autom. Control..