Online Fitted Reinforcement Learning

My paper in the main portion of the conference deals with tted value iteration or Q-learning for ooine problems, i.e., those where we h a ve a model of the environment so that we can examine arbitrary transitions in arbitrary order. The same techniques also allow us to do Q-learning for an online problem, i.e., one where we h a ve n o m o del but must instead perform experiments inside the MDP to gather data. I will describe how.