论文信息 - H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward

H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward

In this paper, we introduce a model-bases reinforcement learning method called H-learning, which optimizes undiscounted average reward. We compare it with three other reinforcement learning methods in the domain of scheduling Automatic Guided Vehicles, and transportation robots used in modern manufacturing plants and facilities. The four methods differ along two dimensions. They are either model-based or model-free, and optimize discounted total reward or undiscounted average reward. Our experimental results indicate that H-learning is more robust with respect to changes in the domain parameters, and in many cases, converges in fewer steps to better average reward per time step than all the other methods. An added advantage is that unlike the other methods it does not have any parameters to tune.

Prasad Tadepalli | DoKyeong Ok | Prasad Tadepalli | DoKyeong Ok | P. Tadepalli

[1] Verzekeren Naar Sparen,et al. Cambridge , 1969, Humphrey Burton: In My Own Time.

[2] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[3] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[4] Toshimi Minoura,et al. Structural Active Object Systems for Manufacturing Control , 1993 .

[5] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[6] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[7] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[8] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.

[9] DoKyeong Ok. A Comparative Study of Undiscounted and Discounted Reinforcement Learning Methods , 1994 .

[10] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[11] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.