Multiple Model-Based Reinforcement Learning

We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The responsibility signal, which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules, as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite-state case and continuous-time, continuous-state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters.

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Mitsuo Kawato,et al.  Recognition of Manipulated Objects by Motor Learning , 1991, NIPS.

[4]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[5]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[6]  Steven J. Nowlan,et al.  Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.

[7]  Mitsuo Kawato,et al.  Recognition of manipulated objects by motor learning with modular architecture networks , 1991, Neural Networks.

[8]  Kumpati S. Narendra,et al.  Adaptation and learning using multiple models, switching, and tuning , 1995 .

[9]  Stefan Schaal,et al.  From Isolation to Cooperation: An Alternative View of a System of Experts , 1995, NIPS.

[10]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12]  Klaus-Robert Müller,et al.  Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics , 1996, Neural Computation.

[13]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[14]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[15]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[16]  D. Wolpert,et al.  Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[17]  Mitsuo Kawato,et al.  Multiple Paired Forward-Inverse Models for Human Motor Learning and Control , 1998, NIPS.

[18]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[19]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[20]  Balaraman Ravindran,et al.  Improved Switching among Temporally Abstract Actions , 1998, NIPS.

[21]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[22]  Stefano Nolfi,et al.  Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.

[23]  Hiroshi Imamizu,et al.  Human cerebellar activity reflecting an acquired internal model of a new tool , 2000, Nature.

[24]  Zoubin Ghahramani,et al.  Computational principles of movement neuroscience , 2000, Nature Neuroscience.

[25]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[26]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[27]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[28]  Daniel M. Wolpert,et al.  Hierarchical MOSAIC for movement generation , 2003 .

[29]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[30]  M. Kawato,et al.  Acquisition and contextual switching of multiple internal models for different viscous force fields , 2003, Neuroscience Research.

[31]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.