A Reinforcement Learning Method for Maximizing Undiscounted Rewards

[1]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[2]  Sridhar Mahadevan,et al.  To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Prasad Tadepalli,et al.  H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .

[5]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[6]  Andrew McCallum,et al.  Using Transitional Proximity for Faster Reinforcement Learning , 1992, ML.

[7]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[8]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[9]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[10]  Thomas H. Westerdale,et al.  Quasimorphisms or Queasymorphisms? Modeling Finite Automaton Environments , 1990, FOGA.

[11]  A. Jalali,et al.  Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[12]  C. Watkins Learning from delayed rewards , 1989 .

[13]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[14]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[15]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[16]  Rutherford Aris,et al.  Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[17]  R. Howard Dynamic Programming and Markov Processes , 1960 .

[18]  H. R. Pitt Divergent Series , 1951, Nature.