The Asymptotic Convergence-Rate of Q-learning
暂无分享,去创建一个
[1] F. Downton. Stochastic Approximation , 1969, Nature.
[2] H. C. Tijms,et al. A modified form of the iterative method of undiscounted dynamic programming , 1973 .
[3] A. Hordijk,et al. A MODIFIED FORM OF THE ITERATIVE METHOD OF DYNAMIC PROGRAMMING , 1975 .
[4] C. Watkins. Learning from delayed rewards , 1989 .
[5] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[6] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[7] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[9] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[10] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[12] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .