An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes
暂无分享,去创建一个
[1] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..
[2] Aurel A. Lazar,et al. Optimal flow control of a class of queueing networks in equilibrium , 1983 .
[3] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[4] Morris W. Hirsch,et al. Convergent activation dynamics in continuous time networks , 1989, Neural Networks.
[5] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[6] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[7] J. Ben Atkinson,et al. An Introduction to Queueing Networks , 1988 .
[8] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[9] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[10] Ilya Segal,et al. Solutions manual for Microeconomic theory : Mas-Colell, Whinston and Green , 1997 .
[11] Shalabh Bhatnagar,et al. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] E. Altman. Constrained Markov Decision Processes , 1999 .
[14] Andrew Zisserman,et al. Advances in Neural Information Processing Systems (NIPS) , 2007 .
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[17] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[18] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[19] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .
[20] Shalabh Bhatnagar,et al. The Borkar-Meyn theorem for asynchronous stochastic approximations , 2011, Syst. Control. Lett..
[21] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[22] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[23] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.