Deciding Which Skill to Learn When: Temporal-Difference Competence-Based Intrinsic Motivation (TD-CB-IM)

Intrinsic motivations can be defined by contrasting them to extrinsic motivations. Extrinsic motivations are directed to drive the learning of behavior directed to satisfy basic needs related to the organisms’ survival and reproduction. Intrinsic motivations, instead, are motivations that serve the evolutionary function of acquiring knowledge (e.g., the capacity to predict) and competence (i.e., the capacity to do) in the absence of extrinsic motivations: this knowledge and competence can be later exploited for producing behaviors that enhance biological fitness. Knowledge-based intrinsic motivation mechanisms (KB-IM), usable for guiding learning on the basis of the level or change of knowledge, have been widely modeled and studied. Instead, competence-based intrinsic motivation mechanisms (CB-IM), usable for guiding learning on the basis of the level or improvement of competence, have been much less investigated. The goal of this chapter is twofold. First, it aims to clarify the nature and possible roles of CB-IM mechanisms for learning, in particular in relation to the cumulative acquisition of a repertoire of skills. Second, it aims to review a specific CB-IM mechanism, the Temporal-Difference Competence-Based Intrinsic Motivation (TD-CB-IM). TD-CB-IM measures the improvement rate of skill acquisition on the basis of the Temporal-Difference learning signal (TD error) that is used in several reinforcement learning (RL) models. The effectiveness of the mechanism is supported by reviewing and discussing in depth the results of experiments in which the TD-CB-IM mechanism is successfully exploited by a hierarchical RL model controlling a simulated navigating robot to decide when to train different skills in different environmental conditions.

[1]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[2]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[3]  Kevin Gurney,et al.  The Role of the Basal Ganglia in Discovering Novel Actions , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[4]  Stephen Hart,et al.  Intrinsically Motivated Affordance Discovery and Modeling , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[5]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[6]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[7]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8]  Marco Mirolli,et al.  Evolution and Learning in an Intrinsically Motivated Reinforcement Learning Robot , 2007, ECAL.

[9]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[10]  Jurgen Schmidhuber,et al.  Artificial curiosity with planning for autonomous perceptual and cognitive development , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[11]  Andrew G. Barto,et al.  Competence progress intrinsic motivation , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[12]  L. S. Vygotskiĭ,et al.  Mind in society : the development of higher psychological processes , 1978 .

[13]  Milton Schwebel,et al.  Review of Mind in society: The development of higher psychological processes. , 1979 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[16]  Domenico Parisi,et al.  A Bioinspired Hierarchical Reinforcement Learning Architecture for Modeling Learning of Multiple Skills with Continuous States and Actions , 2010, EpiRob.

[17]  Peter Stone,et al.  Empowerment for continuous agent—environment systems , 2011, Adapt. Behav..

[18]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[19]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[20]  Gianluca Baldassarre,et al.  What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[21]  Stephen Hart,et al.  Learning Generalizable Control Programs , 2011, IEEE Transactions on Autonomous Mental Development.

[22]  Marco Mirolli,et al.  Biological Cumulative Learning through Intrinsic Motivations: A Simulated Robotic Study on the Development of Visually-Guided Reaching , 2010, EpiRob.

[23]  Jürgen Schmidhuber,et al.  Maximizing Fun by Creating Data with Easily Reducible Subjective Complexity , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[24]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[25]  Edward T. Bullmore,et al.  Modular and Hierarchically Modular Organization of Brain Networks , 2010, Front. Neurosci..

[26]  Marco Mirolli,et al.  Evolving Childhood's Length and Learning Parameters in an Intrinsically Motivated Reinforcement Learning Robot , 2007 .

[27]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[28]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[29]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[30]  Pierre-Yves Oudeyer,et al.  In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[31]  Francesco Mannella,et al.  Intrinsically motivated action-outcome learning and goal-based action recall: a system-level bio-constrained computational model. , 2013, Neural networks : the official journal of the International Neural Network Society.

[32]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[33]  Claes von Hofsten,et al.  Action in development. , 2007 .

[34]  Marco Mirolli,et al.  Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study , 2013, Neural Networks.

[35]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[36]  Marco Mirolli,et al.  What are the Key Open Challenges for Understanding Autonomous Cumulative Learning of Skills ? , 2010 .

[37]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[38]  G. Baldassarre,et al.  Functions and Mechanisms of Intrinsic Motivations The Knowledge Versus Competence Distinction , 2012 .

[39]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[40]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[41]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[42]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[43]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[44]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[45]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[46]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[47]  Gianluca Baldassarre,et al.  A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours , 2002, Cognitive Systems Research.

[48]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[49]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[50]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[51]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[52]  R. Lund Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference , 2006 .

[53]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[54]  E. Deci,et al.  Extrinsic Rewards and Intrinsic Motivation in Education: Reconsidered Once Again , 2001 .

[55]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[56]  Gianluca Baldassarre,et al.  Planning with neural networks and reinforcement learning , 2001 .

[57]  Wolfgang Banzhaf,et al.  Advances in Artificial Life , 2003, Lecture Notes in Computer Science.

[58]  李幼升,et al.  Ph , 1989 .

[59]  Harlow Hf Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[60]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[61]  Henrik I. Christensen,et al.  Evolutionary Development of Hierarchical Learning Structures , 2007, IEEE Transactions on Evolutionary Computation.