Dual Representations for Dynamic Programming and Reinforcement Learning
暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[3] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[5] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] Sheldon M. Ross,et al. Introduction to Probability Models (4th ed.). , 1990 .
[8] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.
[9] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[12] Sheldon M. Ross,et al. Introduction to probability models , 1975 .
[13] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.