论文信息 - PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates - 字舞流文

PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates

PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates by Jason Pazis Department of Computer Science Duke University Date: Approved: Ronald Parr, Supervisor Vincent Conitzer George Konidaris Mauro Maggioni Peter Stone An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2015 Copyright c © 2015 by Jason Pazis All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial License

Jason Pazis | Jason Pazis

[1] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[2] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[3] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[4] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[5] Jonathan P. How,et al. Sample Efficient Reinforcement Learning with Gaussian Processes , 2014, ICML.

[6] Noga Alon,et al. The space complexity of approximating the frequency moments , 1996, STOC '96.

[7] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.

[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.

[10] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.

[11] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.

[12] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[13] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[14] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[15] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .

[16] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[17] Peter Stone,et al. Model-Based Exploration in Continuous State Spaces , 2007, SARA.

[18] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[19] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..

[20] Michael L. Littman,et al. Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.

[21] Peter Stone,et al. RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.

[22] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[23] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.

[24] Peter Stone,et al. Intrinsically motivated model learning for a developing curious agent , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .

[27] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[28] Timothy A. Mann. Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration , 2012 .

[29] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[30] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.

[31] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[32] Kazuo Tanaka,et al. An approach to fuzzy control of nonlinear systems: stability and design issues , 1996, IEEE Trans. Fuzzy Syst..

[33] Emma Brunskill,et al. Concurrent PAC RL , 2015, AAAI.