论文信息 - Randomized function fitting-based empirical value iteration

Randomized function fitting-based empirical value iteration

Randomization is notable for being much less computationally expensive than optimization but often yielding comparable numerical performance. In this paper, we consider randomized function fitting combined with empirical value iteration for approximate dynamic programming on continuous state spaces. The method we propose is universal (i.e., not problem-dependent) and yields good approximations with high probability. A random operator theoretic framework is introduced for convergence analysis which uses a novel stochastic dominance argument. A non-asymptotic rate of convergence is obtained as a byproduct of the analysis. Numerical experiments confirm good performance of the algorithm proposed.

[1] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[2] William B. Haskell,et al. Empirical Dynamic Programming , 2013, Math. Oper. Res..

[3] A. Rahimi,et al. Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[4] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .

[5] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[6] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[7] Dirk Ormoneit,et al. Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[9] Benjamin Recht,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[10] Anthony Almudevar. Approximate Fixed Point Iteration with an Application to Infinite Horizon Markov Decision Processes , 2008, SIAM J. Control. Optim..

[11] Bharath Rangarajan,et al. Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models , 2012, Oper. Res..

[12] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[15] Dimitri P. Bertsekas,et al. Dynamic programming and optimal control, 3rd Edition , 2005 .

[16] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .