An optimization-based categorization of reinforcement learning environments

1 This paper proposes a categorization of reinforcement learning environments based on the optimization of a reinforcement signal over time. Environments are classiied by the simplest agent that can possibly achieve optimal reinforcement. Two parameters, h and , abstractly characterize the complexity of an agent: the ideal (h,)-agent uses the input information provided by the environment and at most h bits of local storage to choose an action that maximizes the discounted sum of the next reinforcements. In an (h,)-environment, an ideal (h,)-agent achieves the maximum possible expected reinforcement for that environment. The paper discusses the special cases when either h = 0 or = 1 in detail, describes some theoretical bounds on h and and re-explores a well-known reinforcement learning environment with this new notation.