Successive Over Relaxation Q-Learning
暂无分享,去创建一个
Shalabh Bhatnagar | Raghuram Bharadwaj Diddigi | Chandramouli Kamanchi | S. Bhatnagar | Chandramouli Kamanchi
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] Sean P. Meyn,et al. Zap Q-Learning , 2017, NIPS.
[3] D. Blackwell. Discrete Dynamic Programming , 1962 .
[4] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[5] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[6] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[7] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[8] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[9] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[10] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[11] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[12] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[13] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[14] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[15] Dieter Reetz,et al. Solution of a Markovian decision problem by successive overrelaxation , 1973, Z. Oper. Research.
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[17] Wenjie Huang,et al. Risk-aware Q-learning for Markov decision processes , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[18] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .