暂无分享,去创建一个
[1] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[2] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[3] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[4] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[5] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.
[6] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[7] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[8] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[9] Bernardo Ávila Pires,et al. Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models , 2016, COLT.
[10] Dan Lizotte,et al. Convergent Fitted Value Iteration with Linear Function Approximation , 2011, NIPS.
[11] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[12] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[13] Yishay Mansour,et al. Approximate Equivalence of Markov Decision Processes , 2003, COLT.
[14] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.
[15] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[16] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[17] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.
[18] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[19] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[20] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[21] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[22] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..
[23] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[24] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[25] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[26] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[27] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[28] Marcus Hutter,et al. Extreme State Aggregation beyond MDPs , 2014, ALT.
[29] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[30] Alessandro Lazaric,et al. Finite-sample Analysis of Bellman Residual Minimization , 2010, ACML.
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[33] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[34] Zhuoran Yang,et al. A Theoretical Analysis of Deep Q-Learning , 2019, L4DC.
[35] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[36] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[37] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[38] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.
[39] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[40] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[41] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[42] A. Barto,et al. An algebraic approach to abstraction in reinforcement learning , 2004 .
[43] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[44] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[45] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[46] Craig Boutilier,et al. Non-delusional Q-learning and value-iteration , 2018, NeurIPS.
[47] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[48] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.