论文信息 - Reinforcement learning algorithms that assimilate and accommodate skills with multiple tasks

Reinforcement learning algorithms that assimilate and accommodate skills with multiple tasks

Children are capable of acquiring a large repertoire of motor skills and of efficiently adapting them to novel conditions. In a previous work we proposed a hierarchical modular reinforcement learning model (RANK) that can learn multiple motor skills in continuous action and state spaces. The model is based on a development of the mixture-of-expert model that has been suitably developed to work with reinforcement learning. In particular, the model uses a high-level gating network for assigning responsibilities for acting and for learning to a set of low-level expert networks. The model was also developed with the goal of exploiting the Piagetian mechanisms of assimilation and accommodation to support learning of multiple tasks. This paper proposes a new model (TERL - Transfer Expert Reinforcement Learning) that substantially improves RANK. The key difference with respect to the previous model is the decoupling of the mechanisms that generate the responsibility signals of experts for learning and for control. This made possible to satisfy different constraints for functioning and for learning. We test both the TERL and the RANK models with a two-DOFs dynamic arm engaged in solving multiple reaching tasks, and compare the two with a simple, flat reinforcement learning model. The results show that both models are capable of exploiting assimilation and accommodation processes in order to transfer knowledge between similar tasks, and at the same time to avoid catastrophic interference. Furthermore, the TERL model is shown to significantly outperform the RANK model thanks to its faster and more stable specialization of experts.

[1] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[2] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[3] Domenico Parisi,et al. A Bioinspired Hierarchical Reinforcement Learning Architecture for Modeling Learning of Multiple Skills with Continuous States and Actions , 2010, EpiRob.

[4] T. Shultz,et al. Generative connectionist networks and constructivist cognitive development , 1996 .

[5] Gianluca Baldassarre,et al. Planning with neural networks and reinforcement learning , 2001 .

[6] Mitsuo Kawato,et al. MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[7] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[8] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[9] Gianluca Baldassarre,et al. A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours , 2002, Cognitive Systems Research.

[10] Peter E. Latham,et al. Statistically Efficient Estimation Using Population Coding , 1998, Neural Computation.

[11] K. H. Grobman,et al. Artificial life and Piaget , 2003, Trends in Cognitive Sciences.

[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13] D M Wolpert,et al. Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[14] J. Gittins,et al. A dynamic allocation index for the discounted multiarmed bandit problem , 1979 .

[15] John Langford. Efficient Exploration in Reinforcement Learning , 2010, Encyclopedia of Machine Learning.

[16] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[17] P. L. Adams. THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[18] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[19] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[20] Tom Schaul,et al. The two-dimensional organization of behavior , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[21] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[22] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[23] Joel L. Davis,et al. Adaptive Critics and the Basal Ganglia , 1995 .

[24] Mitsuo Kawato,et al. MOSAIC for Multiple-Reward Environments , 2008 .

[25] W. Marsden. I and J , 2012 .