POLICY EVALUATION WITH STOCHASTIC GRADIENT ESTIMATION TECHNIQUES
暂无分享,去创建一个
G. Pedrielli | E. P. Chew | P. Lendermann | C. G. Corlu | S. Shashaani | E. Song | T. Roeder | Y. Peng | L. H. Lee | B. Feng
[1] Tikhon Jelvis,et al. Foundations of Reinforcement Learning with Applications in Finance , 2022 .
[2] M. Fu,et al. Estimating a Conditional Expectation with the Generalized Likelihood Ratio Method , 2021, 2021 Winter Simulation Conference (WSC).
[3] Martin Takác,et al. A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization , 2021, Manuf. Serv. Oper. Manag..
[4] Catherine Daveloose,et al. Representations for conditional expectations and applications to pricing and hedging of financial products in Lévy and jump-diffusion setting , 2019, Stochastic Analysis and Applications.
[5] Jalaj Bhandari,et al. A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation , 2018, COLT.
[6] Michael C. Fu,et al. A New Unbiased Stochastic Derivative Estimator for Discontinuous Sample Performances with Structural Parameters , 2018, Oper. Res..
[7] Philip S. Thomas,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines , 2017, ArXiv.
[8] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[9] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[12] Ralph Neuneier,et al. Optimal Asset Allocation using Adaptive Dynamic Programming , 1995, NIPS.
[13] Gang George Yin,et al. Budget-Dependent Convergence Rate of Stochastic Approximation , 1995, SIAM J. Optim..
[14] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[15] Paul Glasserman,et al. Gradient Estimation Via Perturbation Analysis , 1990 .
[16] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[17] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[18] Y. Ho,et al. Perturbation analysis and optimization of queueing networks , 1982, 1982 21st IEEE Conference on Decision and Control.