Which is the best intrinsic motivation signal for learning multiple skills?

Humans and other biological agents are able to autonomously learn and cache different skills in the absence of any biological pressure or any assigned task. In this respect, Intrinsic Motivations (i.e., motivations not connected to reward-related stimuli) play a cardinal role in animal learning, and can be considered as a fundamental tool for developing more autonomous and more adaptive artificial agents. In this work, we provide an exhaustive analysis of a scarcely investigated problem: which kind of IM reinforcement signal is the most suitable for driving the acquisition of multiple skills in the shortest time? To this purpose we implemented an artificial agent with a hierarchical architecture that allows to learn and cache different skills. We tested the system in a setup with continuous states and actions, in particular, with a kinematic robotic arm that has to learn different reaching tasks. We compare the results of different versions of the system driven by several different intrinsic motivation signals. The results show (a) that intrinsic reinforcements purely based on the knowledge of the system are not appropriate to guide the acquisition of multiple skills, and (b) that the stronger the link between the IM signal and the competence of the system, the better the performance.

[1]  Marco Mirolli,et al.  Functions and Mechanisms of Intrinsic Motivations , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[2]  Andrew G. Barto,et al.  Competence progress intrinsic motivation , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[3]  Harlow Hf Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[4]  Marco Mirolli,et al.  Evolving Childhood's Length and Learning Parameters in an Intrinsically Motivated Reinforcement Learning Robot , 2007 .

[5]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[6]  Marco Mirolli,et al.  Computational and Robotic Models of the Hierarchical Organization of Behavior , 2013, Springer Berlin Heidelberg.

[7]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  Pierre-Yves Oudeyer,et al.  The strategic student approach for life-long exploration and learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[9]  Marco Mirolli,et al.  Biological Cumulative Learning through Intrinsic Motivations: A Simulated Robotic Study on the Development of Visually-Guided Reaching , 2010, EpiRob.

[10]  M. Guitart-Masip,et al.  NOvelty-related Motivation of Anticipation and exploration by Dopamine (NOMAD): Implications for healthy aging , 2010, Neuroscience & Biobehavioral Reviews.

[11]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[12]  SinghSatinder,et al.  Between MDPs and semi-MDPs , 1999 .

[13]  Pierre-Yves Oudeyer,et al.  R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[14]  Pierre-Yves Oudeyer,et al.  Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[15]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[16]  Stephen Hart,et al.  Intrinsically Motivated Affordance Discovery and Modeling , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[17]  N. Daw,et al.  Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[18]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[19]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[20]  H. Harlow Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950, Journal of comparative and physiological psychology.

[21]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[22]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[23]  Marco Mirolli,et al.  Cumulative Learning Through Intrinsic Reinforcements , 2014, Evolution, Complexity and Artificial Life.

[24]  G. Baldassarre,et al.  Functions and Mechanisms of Intrinsic Motivations The Knowledge Versus Competence Distinction , 2012 .

[25]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[26]  S. M. Arnsten Intrinsic motivation. , 1990, The American journal of occupational therapy : official publication of the American Occupational Therapy Association.

[27]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[28]  Marco Mirolli,et al.  Intrinsic motivation mechanisms for competence acquisition , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[29]  Marco Mirolli,et al.  Deciding Which Skill to Learn When: Temporal-Difference Competence-Based Intrinsic Motivation (TD-CB-IM) , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[30]  Iu I Aleksandrov,et al.  [Hierarchical organization of behavior]. , 1980, Uspekhi fiziologicheskikh nauk.

[31]  G. Baldassarre,et al.  Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[32]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[33]  J. Tirole,et al.  Intrinsic and Extrinsic Motivation , 2003 .

[34]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[35]  Marco Mirolli,et al.  Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study , 2013, Neural Networks.

[36]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[37]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[38]  Aleksandrov IuI,et al.  Hierarchical organization of behavior , 1980 .

[39]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[40]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[41]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[42]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[43]  D. Berlyne Conflict, arousal, and curiosity , 2014 .

[44]  V. Santucci Intrinsic motivation signals for driving the acquisition of multiple tasks : A simulated robotic study , 2013 .

[45]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[46]  Alexandre Pouget,et al.  Computational approaches to sensorimotor transformations , 2000, Nature Neuroscience.