PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates

PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates by Jason Pazis Department of Computer Science Duke University Date: Approved: Ronald Parr, Supervisor Vincent Conitzer George Konidaris Mauro Maggioni Peter Stone An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2015 Copyright c © 2015 by Jason Pazis All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial License

[1]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[2]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[3]  Csaba Szepesvári,et al.  Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[4]  Michael L. Littman,et al.  A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[5]  Jonathan P. How,et al.  Sample Efficient Reinforcement Learning with Gaussian Processes , 2014, ICML.

[6]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[7]  Tor Lattimore,et al.  PAC Bounds for Discounted MDPs , 2012, ALT.

[8]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9]  John Langford,et al.  Exploration in Metric State Spaces , 2003, ICML.

[10]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[11]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[12]  Andrew Y. Ng,et al.  Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[13]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[14]  Peter Stone,et al.  TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[15]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .

[16]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[17]  Peter Stone,et al.  Model-Based Exploration in Continuous State Spaces , 2007, SARA.

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[19]  Nicholas Roy,et al.  Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..

[20]  Michael L. Littman,et al.  Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.

[21]  Peter Stone,et al.  RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.

[22]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[23]  Ronald Ortner,et al.  Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.

[24]  Peter Stone,et al.  Intrinsically motivated model learning for a developing curious agent , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[27]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[28]  Timothy A. Mann Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration , 2012 .

[29]  Michael L. Littman,et al.  Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[30]  Jason Pazis,et al.  PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.

[31]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[32]  Kazuo Tanaka,et al.  An approach to fuzzy control of nonlinear systems: stability and design issues , 1996, IEEE Trans. Fuzzy Syst..

[33]  Emma Brunskill,et al.  Concurrent PAC RL , 2015, AAAI.