Unsupervised Self-Development in a Multi-Reward Environment

Self-development is an important quality for artificial agents, allowing skill development or improvement. In this contribution we analyze this problem for a scenario with multiple rewards, some easier to reach than others. There is no provided sequence of tasks to enforce self-development; rather, the agent must have an intrinsic motivation to discover more dicult reward sources even if a trivial one is always at hand. Then, by removing simple reward sources, the development performance can be measured. We describe the scenario and discuss as well as measure the applicability of standard learning methods. Based on this analysis we present two techniques to allow the desired self-development: a learning rule for quick trajectory learning and a multimodel learning for multiple reward sources. Simulations show the validity of the presented methods.