How Do We Approach Intrinsic Motivation Computationally
暂无分享,去创建一个
What is the energy function guiding behavior and learningμ Representationbased approaches like maximum entropy, generative models, sparse coding, or slowness principles can account for unsupervised learning of biologically observed structure in sensory systems from raw sensory data. However, they do not relate to behavior. Behavior-based approaches like reinforcement learning explain animal behavior in well-described situations. However, they rely on high-level representations which they cannot extract from raw sensory data. Combinations of multiple goal functions seems the methodology of choice to understand the complexity of the brain. But what is the set of possible goals.
Focusing on the reinforcement learning framework, this question is addressed in the article “What is intrinsic motivationμ A typology of computational approaches” by Pierre-Yves Oudeyer and Frederic Kaplan. It lists and classifies equations which extend the traditional concept of a “reward function”. Our behavior is not only driven by external rewards such as food, but there is a variety of intrinsic motivations. Some are aimed at exploration and so ensure delivery of rich sensory data, aiding unsupervised learning by active data acquisition, where the learning progress of the sensory system becomes the goal.
A novice reader may first want to familiarize himself with an example of a motivation function implemented in a model and applied in some scenario. A fun example is Schmidhuber (2006), which would be classified as “Learning Progress Motivation” (LPM) in the article of Oudeyer and Kaplan. The model consists of a predictor and a controller, aka critic and actor, respectively. The critic is a sensory system that gives rewards to the actor whenever its learning progresses. The actor hence learns to act in such a way that the critic is presented data which leads to the critic”s learning progress. This can explain the learning of the actor”s parameters by a reinforcement learning algorithm. The structure, parameters and the learning paradigm of the critic are not specified, but unsupervised learning as to learning to predict would be suitable.
The broad overview of intrinsic motivation functions offered by Oudeyer and Kaplan leads to novel ways of conceptualizing and gaining new insights into the variety of computational mechanisms driving behavior and learning. A possible extension of the typology could include goal functions of unsupervised learning. Then an assessment of the relations between all relevant goal functions may provide a well-founded systems view of the brain.
[1] Jürgen Schmidhuber,et al. Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .