Instructing a Reinforcement Learner

In reinforcement learning (RL), rewards have been considered the most important feedback in understanding the environment. However, recently there have been interesting forays into other modes such as using sporadic supervisory inputs. This brings into the learning process richer information about the world of interest. In this paper, we model these supervisory inputs as specific types of instructions that provide information in the form of an expert’s control decision and certain structural regularities in the state space. We further provide a mathematical formulation for the same and propose a framework to incorporate them into the learning process.