Reinforcement learning of multiple tasks using parametric bias

We propose a reinforcement learning system designed to learn multiple different continuous state-action-space tasks. The system has been tested on a family of space-searching task akin to Morris water maze, but with obstacles. While exploring a task, the agent builds its internal model of the environment and approximates a state value function. For learning multiple tasks, we use a parametric bias switching mechanism in which the value of the parametric bias layer identifies the task for the agent. Each task has a specific parametric bias vector, and during training the vectors self-organize to reflect the structure of relationships between tasks in the task set. This mapping of the task set to parametric bias space can later be used to generate novel behaviors of the agent.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Jun Tani,et al.  Self-organization of distributedly represented multiple behavior schemata in a mirror system: reviews of robot experiments using RNNPB , 2004, Neural Networks.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Jun Tani,et al.  Self-organization of distributedly represented multiple behavior schemata in a mirror system: reviews of robot experiments using RNNPB [Neural Networks 17 (8–9) 1273–1289] , 2005 .

[5]  Jun Tani,et al.  Generalization in Learning Multiple Temporal Patterns Using RNNPB , 2004, ICONIP.

[6]  Jun Tani,et al.  A Holistic Approach to Compositional Semantics: A Connectionist Model and Robot Experiments , 2003, NIPS.

[7]  Kurt Hornik,et al.  FEED FORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS , 1989 .

[8]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[9]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[10]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[11]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[12]  Shigeki Sugano,et al.  Reinforcement Learning Algorithm with CTRNN in Continuous Action Space , 2006, ICONIP.

[13]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[14]  Aude Billard,et al.  Adaptive Motor Primitive and Sequence Formation in a Hierarchical Recurrent Neural Network , 2004 .