Improving state-action space exploration in reinforcement learning using geometric properties

Learning a model or learning a policy that optimizes some objective function relies on data-sets that describe the behavior of the system. When such sets are unavailable or insufficient, additional data may be generated through new experiments (if feasible) or through simulations (if an accurate model is available). In this paper we describe a third alternative that is based on the availability of a qualitative model of the physical system. In particular, we show how the number of experiments used in reinforcement learning can be reduced by leveraging geometric properties of the system. The geometric properties are independent of any particular instantiation of the qualitative model. As an illustrative example, we apply our approach to a cart-pole system.