Efficient Exploration and Value Function Generalization in Deterministic Systems
[1] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] Warren B. Powell,et al. Optimal Learning , 2022, Encyclopedia of Machine Learning and Data Mining.
[4] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[5] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[6] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[7] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[8] Tor Lattimore,et al. The Sample-Complexity of General Reinforcement Learning , 2013, ICML.
[9] Alessandro Lazaric,et al. Regret Bounds for Reinforcement Learning with Policy Advice , 2013, ECML/PKDD.
[10] Adel Javanmard,et al. Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.
[11] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, STOC '84.
[12] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[13] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[14] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[15] Lihong Li,et al. Reducing reinforcement learning to KWIK online regression , 2010, Annals of Mathematics and Artificial Intelligence.
[16] Geoffrey J. Gordon. Online Fitted Reinforcement Learning , 1995 .
[17] Warren B. Powell,et al. Optimal Learning: Powell/Optimal , 2012 .
[18] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[19] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[22] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[23] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML.
[24] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[25] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[26] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[27] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[28] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[29] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.