Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes
暂无分享,去创建一个
[1] Michael C. Fu,et al. Optimal structured feedback policies for ABR flow control using two-timescale SPSA , 2001, TNET.
[2] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[3] Shalabh Bhatnagar,et al. A simultaneous perturbation stochastic approximation-based actor-critic algorithm for Markov decision processes , 2004, IEEE Transactions on Automatic Control.
[4] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[5] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[6] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[7] Shalabh Bhatnagar,et al. Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization , 2007, TOMC.
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Jonathan Chin. Cisco Frame Relay Solutions Guide , 2004 .
[10] David C. Parkes,et al. Approximately Efficient Online Mechanism Design , 2004, NIPS.
[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[12] Nicolò Cesa-Bianchi,et al. Finite-Time Regret Bounds for the Multiarmed Bandit Problem , 1998, ICML.
[13] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[14] Mark A. Shayman,et al. Multitime scale Markov decision processes , 2003, IEEE Trans. Autom. Control..
[15] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[16] Frédérick Garcia,et al. A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon , 1998, ICML.
[17] V. Nollau. Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .
[18] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[19] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[20] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[21] A. Shwartz,et al. Handbook of Markov decision processes : methods and applications , 2002 .
[22] S. Marcus,et al. An asymptotically efficient algorithm for finite horizon stochastic dynamic programming problems , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).
[23] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[24] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] Michael C. Fu,et al. An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming , 2007, IEEE Transactions on Automatic Control.
[27] Yasemin Serin. A nonlinear programming model for partially observable Markov decision processes: Finite horizon case , 1995 .
[28] J. Neveu,et al. Discrete Parameter Martingales , 1975 .
[29] Chelsea C. White,et al. A Hybrid Genetic/Optimization Algorithm for Finite-Horizon, Partially Observed Markov Decision Processes , 2004, INFORMS J. Comput..
[30] David C. Parkes,et al. An MDP-Based Approach to Online Mechanism Design , 2003, NIPS.
[31] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[32] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[33] Sheldon M. Ross. Introduction to Probability Models. , 1995 .
[34] Sheldon M. Ross,et al. Introduction to Probability Models, Eighth Edition , 1972 .
[35] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[36] Shalabh Bhatnagar,et al. Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes , 2007, Discret. Event Dyn. Syst..
[37] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[38] Shalabh Bhatnagar,et al. Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization , 2005, TOMC.
[39] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..
[40] Michael C. Fu,et al. Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences , 2003, TOMC.
[41] László Gerencsér,et al. Optimization over discrete sets via SPSA , 1999, WSC '99.
[42] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[43] J. Baras,et al. A Hierarchical Structure For Finite Horizon Dynamic Programming Problems , 2000 .
[44] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.
[45] M.C. Fu,et al. A Markov decision process model for capacity expansion and allocation , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[46] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.