Learning Optimal Policies with State Aggregation

Among the several issues that arise when function approximation is used with Reinforcement Learning (RL) algorithms, one of the most relevant problems is that the optimality is no longer guaranteed. In this work, we focus on state aggregation, where different states are aggregated into a single macrostate. We propose a class of algorithms that, on the basis of the computation of bounds of the optimal action values for each state in a macrostate, determine whether it is possible to learn the optimal policy over the given aggregation.