Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms
暂无分享,去创建一个
[1] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[2] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .
[3] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[4] Satinder Singh,et al. An Upper Bound on the Loss from Approximate Optimal-Value Functions , 2004, Machine Learning.
[5] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[6] F. De Bruyne,et al. Iterative controller optimization for nonlinear systems , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[7] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] T. Bukowski,et al. Integral. , 2019, Healthcare protection management.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[14] S. Gunnarsson,et al. A convergent iterative restricted complexity control design scheme , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.
[15] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[16] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[17] R. A. Silverman,et al. Integral, Measure and Derivative: A Unified Approach , 1967 .
[18] John G. Kemeny,et al. Finite Markov chains , 1960 .
[19] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[20] Robert R. Bitmead,et al. Direct iterative tuning via spectral analysis , 2000, Autom..
[21] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[22] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[23] Peter Lancaster,et al. The theory of matrices , 1969 .
[24] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[25] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[26] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .