Action Refinement in Reinforcement Learning by Probability Smoothing

In many reinforcement learning applications, the set of possible actions can be partitioned by the programmer into subsets of similar actions. This paper presents a technique for exploiting this form of prior information to speed up model-based reinforcement learning. We call it an action refinement method, because it treats each subset of similar actions as a single “abstract” action early in the learning process and then later “refines” the abstract action into individual actions as more experience is gathered. Our method estimates the transition probabilities P (s′|s, a) for an action a by combining the results of executions of action a with executions of other actions in the same subset of similar actions. This is a form of “smoothing” of the probability estimates that trades increased bias for reduced variance. The paper derives a formula for optimal smoothing which shows that the degree of smoothing should decrease as the amount of data increases. Experiments show that probability smoothing is better than two simpler action refinement methods on a synthetic maze problem. Action refinement is most useful in problems, such as robotics, where training experiences are expensive.