Formalisation of metamorph Reinforcement Learning

This technical report describes the formalisation of a particular Reinforcement Learning (RL) situation that we call "metamorph" (mRL). In this situation, the signature of the learner agent, i.e. its set of inputs, outputs and feedback slots, can change over the course of learning. RL can be viewed as signal processing, because the learner agent transforms the inputs/feedbacks signals it is continuously fed with into output signals. The following formalisation is therefore concerned with signals description and the transformation from one signal to another. Also, since the signature of the agent is expected to change, we get concerned in the definition of what is a "signature" and a "signature change". In the first part, we describe mRL learning context, or how the metamorph agent is embedded into its environment and interacts with it. In the second part, we describe one generic example of a metamorph learner agent: a dynamical computational graph that could theoretically be used in controlling the agent. In the last part, we reformulate the classical problem of RL, a.k.a. "maximizing feedback" in terms of this formalised mRL. 1