Infinite-Horizon Policy-Gradient Estimation
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] R. A. Silverman,et al. Integral, Measure and Derivative: A Unified Approach , 1967 .
[3] Peter Lancaster,et al. The theory of matrices , 1969 .
[4] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[5] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[6] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[7] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[8] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[9] F. R. Gantmakher. The Theory of Matrices , 1984 .
[10] Peter W. Glynn,et al. Proceedings of Ihe 1986 Winter Simulation , 2022 .
[11] Alan Weiss,et al. Sensitivity analysis via likelihood ratios , 1986, WSC '86.
[12] R. M. Dudley,et al. Real Analysis and Probability , 1989 .
[13] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..
[14] R. Rubinstein. How to optimize discrete-event systems from a single sample path by the score function method , 1991 .
[15] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[16] Xi-Ren Cao,et al. Perturbation analysis of discrete event dynamic systems , 1991 .
[17] Reuven Y. Rubinstein,et al. Decomposable score function estimators for sensitivity analysis and optimization of queueing networks , 1992, Ann. Oper. Res..
[18] Reuven Y. Rubinstein,et al. Discrete Event Systems , 1993 .
[19] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[20] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[21] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[22] Shigenobu Kobayashi,et al. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.
[23] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[24] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[25] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION FOR , 1995 .
[26] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[27] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[28] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[29] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[30] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[31] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[32] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[33] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[34] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[35] Reuven Y. Rubinstein,et al. Modern simulation and modeling , 1998 .
[36] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[37] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[38] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[39] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[40] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[41] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[42] Lex Weaver,et al. A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.
[43] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[44] Peter L. Bartlett,et al. Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..