H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward
暂无分享,去创建一个
[1] Verzekeren Naar Sparen,et al. Cambridge , 1969, Humphrey Burton: In My Own Time.
[2] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[3] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[4] Toshimi Minoura,et al. Structural Active Object Systems for Manufacturing Control , 1993 .
[5] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[6] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[7] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[8] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[9] DoKyeong Ok. A Comparative Study of Undiscounted and Discounted Reinforcement Learning Methods , 1994 .
[10] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[11] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.