Biasing Approximate Dynamic Programming with a Lower Discount Factor
暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[3] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[4] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[5] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[6] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[7] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[8] Benjamin Van Roy,et al. Tetris: A Study of Randomized Constraint Sampling , 2006 .
[9] Sean P. Meyn,et al. Probabilistic and Randomized Methods for Design under Uncertainty , 2006 .
[10] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[11] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .
[12] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[13] B. Biller. Approximate Dynamic Programming for High-Dimensional Problems , 2008 .