Multiple model-based reinforcement learning for nonlinear control

This paper proposes a reinforcement learning scheme using multiple prediction models (multiple model-based reinforcement learning, MMRL). MMRL prepares multiple pairs, consisting of the prediction model used to predict the future state of the control object and the reinforcement learning controller used to learn the control output. Using a soft-max function of the prediction error of each prediction model, the “responsibility signal” is calculated, which takes a larger value for the module with a more accurate prediction. By weighting the learning and the control output of each module by means of the responsibility signal, modules to deal with various situations are formed. In order to achieve a robust modular structure of MMRL without a priori knowledge, such as the number of modules and the region to be covered, a prior responsibility signal is formulated, assuming spatial and temporal continuity. As a method for efficient implementation of MMRL, an optimal controller (MLQC) based on multiple linear prediction and quadratic reward models is formulated. In order to verify the performance of MLQC, a simulation was performed on the swing-up of a single pendulum. It was shown that the linear prediction model and the corresponding controller were acquired by learning for the range near the suspended point and upright point of the single pendulum. The task can be learned in a shorter time than in the conventional method, and it is possible to handle redundancy of modules. © 2006 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 89(9): 54–69, 2006; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20266

[1]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[2]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[3]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[4]  Jun Morimoto,et al.  Hierarchical Reinforcement Learning of Low-Dimensional Subgoals and High-Dimensional Trajectories , 1998, ICONIP.

[5]  BRENDAN O. MCGONIGLE,et al.  Long-term retention of single and multistate prismatic adaptation by humans , 1978, Nature.

[6]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[7]  D. Wolpert,et al.  Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[8]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[9]  Klaus-Robert Müller,et al.  Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics , 1996, Neural Computation.

[10]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[11]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[12]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[13]  Kumpati S. Narendra,et al.  Adaptation and learning using multiple models, switching, and tuning , 1995 .

[14]  Jun Morimoto,et al.  Conference on Intelligent Robots and Systems Reinforcement Le,arning of Dynamic Motor Sequence: Learning to Stand Up , 2022 .

[15]  Mitsuo Kawato,et al.  Multiple Paired Forward-Inverse Models for Human Motor Learning and Control , 1998, NIPS.

[16]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[17]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[18]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[19]  Stefano Nolfi,et al.  Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.

[20]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[21]  Chen K. Tham,et al.  Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..

[22]  Naonori Ueda,et al.  Deterministic Annealing Variant of the EM Algorithm , 1994, NIPS.

[23]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.