PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates
暂无分享,去创建一个
[1] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[2] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[3] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[4] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[5] Jonathan P. How,et al. Sample Efficient Reinforcement Learning with Gaussian Processes , 2014, ICML.
[6] Noga Alon,et al. The space complexity of approximating the frequency moments , 1996, STOC '96.
[7] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[9] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[10] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[11] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[12] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[13] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[14] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.
[15] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[16] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[17] Peter Stone,et al. Model-Based Exploration in Continuous State Spaces , 2007, SARA.
[18] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[19] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..
[20] Michael L. Littman,et al. Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.
[21] Peter Stone,et al. RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.
[22] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[23] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[24] Peter Stone,et al. Intrinsically motivated model learning for a developing curious agent , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[26] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[27] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .
[28] Timothy A. Mann. Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration , 2012 .
[29] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[30] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.
[31] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[32] Kazuo Tanaka,et al. An approach to fuzzy control of nonlinear systems: stability and design issues , 1996, IEEE Trans. Fuzzy Syst..
[33] Emma Brunskill,et al. Concurrent PAC RL , 2015, AAAI.