Scalable Bilinear π Learning Using State and Action Features
暂无分享,去创建一个
[1] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[2] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[3] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[4] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .
[5] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[6] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[7] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[9] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[10] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[11] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[12] Le Song,et al. Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.
[13] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[14] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[15] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[16] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[17] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[18] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[19] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.
[20] Le Song,et al. Boosting the Actor with Dual Critic , 2017, ICLR.
[21] Gavin Taylor,et al. Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs , 2012, UAI.
[22] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Mengdi Wang,et al. Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.
[25] Dale Schuurmans,et al. Dual Temporal Difference Learning , 2009, AISTATS.
[26] Shalabh Bhatnagar,et al. A Linearly Relaxed Approximate Linear Program for Markov Decision Processes , 2017, IEEE Transactions on Automatic Control.
[27] Tao Wang,et al. Stable Dual Dynamic Programming , 2007, NIPS.
[28] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[29] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[30] Mengdi Wang,et al. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.
[31] Ali H. Sayed,et al. Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.
[32] Thomas G. Dietterich,et al. PAC optimal MDP planning with application to invasive species management , 2015, J. Mach. Learn. Res..
[33] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time , 2017, ArXiv.
[34] Mengdi Wang,et al. An online primal-dual method for discounted Markov decision processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).
[35] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[36] Charles Elkan,et al. Reinforcement Learning with a Bilinear Q Function , 2011, EWRL.
[37] Peter L. Bartlett,et al. Linear Programming for Large-Scale Markov Decision Problems , 2014, ICML.
[38] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[39] Thomas G. Dietterich,et al. PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs , 2013, AAAI.