A novel Q-learning algorithm with function approximation for constrained Markov decision processes
暂无分享,去创建一个
[1] Benjamin Van Roy,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[2] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[3] Michael C. Fu,et al. Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences , 2003, TOMC.
[4] Jean C. Walrand,et al. An introduction to queueing networks , 1989, Prentice Hall International editions.
[5] A. Mas-Colell,et al. Microeconomic Theory , 1995 .
[6] E. Altman. Constrained Markov Decision Processes , 1999 .
[7] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[8] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[9] An Online Convergent Q-learning Algorithm with Linear Function Approximation , 2011 .
[10] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[11] Francisco S. Melo,et al. Q -Learning with Linear Function Approximation , 2007, COLT.
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.
[14] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[15] Shalabh Bhatnagar,et al. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[18] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[21] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..
[22] Shalabh Bhatnagar,et al. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..
[23] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..
[24] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.