Unsupervised Reinforcement Learning
暂无分享,去创建一个
Conventionally, reinforcement learning algorithms are goaldirected: they aim to acquire policies that most effectively maximize a given reward signal. However, if we consider agents thatmustmaster very large repertoires of behaviors – such as general-purpose robots that must perform a diverse array of tasks in the real world – then it makes sense to instead frame the reinforcement learning process as an unsupervised learning procedure, which has the aim of extracting a large and diverse array of skills that can later be utilized for the many tasks that the agents may be asked to perform. Such a formulation not only makes it feasible to acquire diverse behaviors before any reward signal is actually observed, but can actually make learning much more tractable for tasks with delayed or sparse reward signals. In this talk, I will discuss recent advances in unsupervised reinforcement learning, many of which draw on an information-theoretic formulation for the unsupervised skill acquisition problem. I will discuss how this formulation can provide us with a principled view of unsupervised skill acquisition, and furthermore provides some tantalizing clues about how to quantify the usefulness of learned behaviors. I will also present experimental results showing that unsupervised reinforcement learning not only provides good results in a variety of simpler simulated environments, but in fact can be utilized with real-world robotic systems to learn sophisticated behaviors with minimal human input.