Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model
暂无分享,去创建一个
Xian Wu | Lin Yang | Mengdi Wang | Lin F. Yang | Yinyu Ye | Aaron Sidford | Y. Ye | Aaron Sidford | Mengdi Wang | X. Wu
[1] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[2] Yin Tat Lee,et al. Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.
[3] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[4] Yinyu Ye,et al. A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..
[5] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .
[6] Yishay Mansour,et al. On the Complexity of Policy Iteration , 1999, UAI.
[7] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[8] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[9] Yin Tat Lee,et al. Path Finding Methods for Linear Programming: Solving Linear Programs in Õ(vrank) Iterations and Faster Algorithms for Maximum Flow , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.
[10] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[11] George B. Dantzig,et al. Linear programming and extensions , 1965 .
[12] Vivek S. Borkar,et al. Empirical Q-Value Iteration , 2014, Stochastic Systems.
[13] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[14] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[15] Peter Bro Miltersen,et al. Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor , 2010, JACM.
[16] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[17] Bruno Scherrer,et al. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration , 2013, Math. Oper. Res..
[18] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .
[19] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.
[20] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time , 2017, ArXiv.
[21] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[22] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[23] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..