Off-Policy Knowledge Maintenance for Robots

A fundamental difficulty in robotics arises from changes in the experienced environment—periods when the robot’s current situation differs from past experience. We present an architecture whereby many independent reinforcement learning agents (or demons) observe the behaviour of a single robot. Each demon learns one piece of world knowledge represented with a generalized value function. This architecture allows the demons to update their knowledge online and off-policy from the robot’s behaviour. We present one approach to active exploration using curiosity—an internal measure of learning progress—and conclude with a preliminary result showing how a robot can adapt its prediction of the time needed to come to a full stop.