暂无分享,去创建一个
Nicolas Le Roux | Marc G. Bellemare | Dale Schuurmans | Adrien Ali Taïga | Robert Dadashi | Robert Dadashi | D. Schuurmans
[1] V. Klee. Some characterizations of convex polyhedra , 1959 .
[2] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[3] A. Brøndsted. An Introduction to Convex Polytopes , 1982 .
[4] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[7] Dimitri P. Bertsekas,et al. Generic rank-one corrections for value iteration in Markovian decision problems , 1995, Oper. Res. Lett..
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] Yishay Mansour,et al. On the Complexity of Policy Iteration , 1999, UAI.
[12] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[13] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[14] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[15] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[16] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[17] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[20] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[21] Tao Wang,et al. Dual Representations for Dynamic Programming and Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[22] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..
[23] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[24] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.