An online primal-dual method for discounted Markov decision processes
暂无分享,去创建一个
[1] Yunmei Chen,et al. Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[3] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[4] Bo Liu,et al. Sparse Q-learning with Mirror Descent , 2012, UAI.
[5] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[6] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[7] Randy Cogill,et al. Primal-dual algorithms for discounted Markov decision processes , 2015, 2015 European Control Conference (ECC).
[8] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[9] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[10] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[11] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[14] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..
[15] Sujin Kim,et al. The stochastic root-finding problem: Overview, solutions, and open questions , 2011, TOMC.
[16] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[17] Raghu Pasupathy,et al. Simulation Optimization: A Concise Overview and Implementation Guide , 2013 .
[18] Dale Schuurmans,et al. Dual Temporal Difference Learning , 2009, AISTATS.
[19] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[20] Mohammad Gheshlaghi Azar,et al. On the theory of reinforcement learning : methods, convergence analysis and sample complexity , 2012 .
[21] Yuan Tian,et al. Understanding intra-urban trip patterns from taxi trajectory data , 2012, Journal of Geographical Systems.
[22] Peter L. Bartlett,et al. Linear Programming for Large-Scale Markov Decision Problems , 2014, ICML.
[23] Guanghui Lan,et al. Randomized First-Order Methods for Saddle Point Optimization , 2014, 1409.8625.
[24] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[25] Michael H. Veatch,et al. Approximate Linear Programming for Average Cost MDPs , 2013, Math. Oper. Res..
[26] R. Rubinstein,et al. An Efficient Stochastic Approximation Algorithm for Stochastic Saddle Point Problems , 2005 .