A Reinforcement Learning Method for Maximizing Undiscounted Rewards
暂无分享,去创建一个
[1] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[2] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Prasad Tadepalli,et al. H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .
[5] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[6] Andrew McCallum,et al. Using Transitional Proximity for Faster Reinforcement Learning , 1992, ML.
[7] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[8] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[9] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[10] Thomas H. Westerdale,et al. Quasimorphisms or Queasymorphisms? Modeling Finite Automaton Environments , 1990, FOGA.
[11] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[12] C. Watkins. Learning from delayed rewards , 1989 .
[13] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[14] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[15] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[16] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[17] R. Howard. Dynamic Programming and Markov Processes , 1960 .
[18] H. R. Pitt. Divergent Series , 1951, Nature.