论文信息 - Non-Parametric Approximate Linear Programming for MDPs

Non-Parametric Approximate Linear Programming for MDPs

The Approximate Linear Programming (ALP) approach to value function approximation for MDPs is a parametric value function approximation method, in that it represents the value function as a linear combination of features which are chosen a priori. Choosing these features can be a difficult challenge in itself. One recent effort, Regularized Approximate Linear Programming (RALP), uses L1 regularization to address this issue by combining a large initial set of features with a regularization penalty that favors a smooth value function with few non-zero weights. Rather than using smoothness as a backhanded way of addressing the feature selection problem, this paper starts with smoothness and develops a non-parametric approach to ALP that is consistent with the smoothness assumption. We show that this new approach has some favorable practical and analytical properties in comparison to (R)ALP.

Jason Pazis | Ronald Parr | Ronald E. Parr | Jason Pazis

[1] Gavin Taylor,et al. Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs , 2012, UAI.

[2] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.

[3] Jason Pazis,et al. Reinforcement learning in multidimensional continuous action spaces , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[4] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[5] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[6] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[7] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[9] Kazuo Tanaka,et al. An approach to fuzzy control of nonlinear systems: stability and design issues , 1996, IEEE Trans. Fuzzy Syst..

[10] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[11] Branislav Kveton,et al. Kernel-Based Reinforcement Learning on Representative States , 2012, AAAI.

[12] Oliver Kroemer,et al. A Non-Parametric Approach to Dynamic Programming , 2011, NIPS.

[13] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[14] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[17] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[18] Jason Pazis,et al. Generalized Value Functions for Large Action Sets , 2011, ICML.

[19] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.

[20] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.

[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .