A Note on Optimization Formulations of Markov Decision Processes
暂无分享,去创建一个
[1] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[3] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[4] Donghwan Lee,et al. Stochastic Primal-Dual Q-Learning Algorithm For Discounted MDPs , 2019, 2019 American Control Conference (ACC).
[5] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[6] Mengdi Wang,et al. Accelerating Stochastic Composition Optimization , 2016, NIPS.
[7] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[8] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[9] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[10] Yuhua Zhu,et al. Borrowing From the Future: Addressing Double Sampling in Model-free Control , 2020, ArXiv.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] Qiang Liu,et al. Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation , 2019, ICLR.
[13] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[14] A Short Note on Stationary Distributions of Unichain Markov Decision Processes , 2006, math/0604452.
[15] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .
[16] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..
[17] Mengdi Wang,et al. Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time , 2020, Math. Oper. Res..
[18] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[19] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[20] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[23] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[24] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[25] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .
[26] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[27] Yuhua Zhu,et al. Borrowing From the Future: An Attempt to Address Double Sampling , 2020, MSML.
[28] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[29] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[30] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[31] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[32] Peter L. Bartlett,et al. Linear Programming for Large-Scale Markov Decision Problems , 2014, ICML.
[33] Mengdi Wang,et al. Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.
[34] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.