论文信息 - Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming

Sample Complexity and Performance Bounds for Non-Parametric Approximate Linear Programming

One of the most difficult tasks in value function approximation for Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. Recent results in nonparametric aproximate linear programming (NP-ALP), have demonstrated that this can be done effectively using nothing more than a smoothness assumption on the value function. In this paper we extend these results to the case where samples come from real world transitions instead of the full Bellman equation, adding robustness to noise. In addition, we provide the first max-norm, finite sample performance guarantees for any form of ALP. NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.

Jason Pazis | Ronald Parr

[1] Branislav Kveton,et al. Kernel-Based Reinforcement Learning on Representative States , 2012, AAAI.

[2] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[3] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.

[4] Jason Pazis,et al. Generalized Value Functions for Large Action Sets , 2011, ICML.

[5] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[6] Jason Pazis,et al. Non-Parametric Approximate Linear Programming for MDPs , 2011, AAAI.

[7] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.

[8] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.

[9] Kazuo Tanaka,et al. An approach to fuzzy control of nonlinear systems: stability and design issues , 1996, IEEE Trans. Fuzzy Syst..

[10] Oliver Kroemer,et al. A Non-Parametric Approach to Dynamic Programming , 2011, NIPS.

[11] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.