Sample-Optimal Parametric Q-Learning Using Linearly Additive Features
暂无分享,去创建一个
[1] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[2] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[3] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[4] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[5] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[6] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[7] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[10] Bruno Scherrer,et al. Rate of Convergence and Error Bounds for LSTD(λ) , 2014, ICML 2015.
[11] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[14] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[15] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[16] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[17] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[18] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[19] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[20] Mengdi Wang,et al. State Aggregation Learning from Markov Transition Data , 2018, NeurIPS.
[21] Sanjeev Arora,et al. Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.
[22] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[23] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[24] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[25] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[26] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[27] Victoria Stodden,et al. When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.
[28] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[29] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[30] H. Kappen,et al. Reinforcement Learning with a Near Optimal Rate of Convergence , 2011 .
[31] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[32] Kamyar Azizzadenesheli,et al. Reinforcement Learning in Rich-Observation MDPs using Spectral Methods , 2016, 1611.03907.
[33] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.