论文信息 - GRAIL: A Goal-Discovering Robotic Architecture for Intrinsically-Motivated Learning

GRAIL: A Goal-Discovering Robotic Architecture for Intrinsically-Motivated Learning

In this paper, we present goal-discovering robotic architecture for intrisically-motivated learning (GRAIL), a four-level architecture that is able to autonomously: 1) discover changes in the environment; 2) form representations of the goals corresponding to those changes; 3) select the goal to pursue on the basis of intrinsic motivations (IMs); 4) select suitable computational resources to achieve the selected goal; 5) monitor the achievement of the selected goal; and 6) self-generate a learning signal when the selected goal is successfully achieved. Building on previous research, GRAIL exploits the power of goals and competence-based IMs to autonomously explore the world and learn different skills that allow the robot to modify the environment. To highlight the features of GRAIL, we implement it in a simulated iCub robot and test the system in four different experimental scenarios where the agent has to perform reaching tasks within a 3-D environment.

[1] Harlow Hf. Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. , 1950 .

[2] D. Berlyne. NOVELTY AND CURIOSITY AS DETERMINANTS OF EXPLORATORY BEHAVIOUR1 , 1950 .

[3] R. Butler. Discrimination learning by rhesus monkeys to visual-exploration motivation. , 1953, Journal of comparative and physiological psychology.

[4] K. Montgomery. The role of the exploratory drive in learning. , 1954, Journal of comparative and physiological psychology.

[5] G. B. Kish. Learning when the onset of illumination is used as reinforcing stimulus. , 1955, Journal of comparative and physiological psychology.

[6] R. W. White. Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[7] D. Berlyne. Curiosity and exploration. , 1966, Science.

[8] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[9] Edward L. Deci,et al. Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[10] K. Miller,et al. Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[11] Dana H. Ballard,et al. Animate Vision , 1991, Artif. Intell..

[12] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[13] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[14] E. Rolls,et al. Neural networks and brain function , 1998 .

[15] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[16] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[17] Alexandre Pouget,et al. Computational approaches to sensorimotor transformations , 2000, Nature Neuroscience.

[18] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[19] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[20] Minoru Asada,et al. Cognitive developmental robotics as a new paradigm for the design of humanoid robots , 2001, Robotics Auton. Syst..

[21] James L. McClelland,et al. Autonomous Mental Development by Robots and Animals , 2001, Science.

[22] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[23] Xiao Huang,et al. Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[24] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[25] Wulfram Gerstner,et al. Mathematical formulations of Hebbian learning , 2002, Biological Cybernetics.

[26] Peter Dayan,et al. Dopamine: generalization and bonuses , 2002, Neural Networks.

[27] Gianluca Baldassarre,et al. A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours , 2002, Cognitive Systems Research.

[28] Gianluca Baldassarre,et al. Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot , 2003, ABiALS.

[29] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[30] G. Rainer,et al. Cognitive neuroscience: Neural mechanisms for detecting and remembering novel events , 2003, Nature Reviews Neuroscience.

[31] Giulio Sandini,et al. Developmental robotics: a survey , 2003, Connect. Sci..

[32] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[33] C. Hofsten. An action perspective on motor development , 2004, Trends in Cognitive Sciences.

[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35] Bram Bakker,et al. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[36] Mehdi Khamassi,et al. Combining Self-organizing Maps with Mixtures of Experts: Application to an Actor-Critic Model of Reinforcement Learning in the Basal Ganglia , 2006, SAB.

[37] P. Redgrave,et al. The short-latency dopamine signal: a role in discovering novel actions? , 2006, Nature Reviews Neuroscience.

[38] G. Baldassarre,et al. Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[39] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[40] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[41] Marco Mirolli,et al. Evolving Childhood's Length and Learning Parameters in an Intrinsically Motivated Reinforcement Learning Robot , 2007 .

[42] N. Daw,et al. Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[43] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[44] Andrew G. Barto,et al. Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[45] Autonomously Learning an Action Hierarchy Using a Learned Qualitative State Representation , 2009, IJCAI.

[46] Pierre-Yves Oudeyer,et al. R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[47] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[48] Ethan S. Bromberg-Martin,et al. Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards , 2009, Neuron.

[49] Lisa Meeden,et al. Category-based intrinsic motivation , 2009, EpiRob.

[50] Pierre-Yves Oudeyer,et al. Robust intrinsically motivated exploration and active learning , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[51] Andrew G. Barto,et al. Competence progress intrinsic motivation , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[52] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[53] Peter Dayan,et al. Pavlovian-Instrumental Interaction in ‘Observing Behavior’ , 2010, PLoS Comput. Biol..

[54] Giulio Sandini,et al. The iCub humanoid robot: An open-systems platform for research in cognitive development , 2010, Neural Networks.

[55] Mark H. Lee,et al. Integration of Active Vision and Reaching From a Developmental Robotics Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[56] M. Guitart-Masip,et al. NOvelty-related Motivation of Anticipation and exploration by Dopamine (NOMAD): Implications for healthy aging , 2010, Neuroscience & Biobehavioral Reviews.

[57] Domenico Parisi,et al. A Bioinspired Hierarchical Reinforcement Learning Architecture for Modeling Learning of Multiple Skills with Continuous States and Actions , 2010, EpiRob.

[58] Andrew G. Barto,et al. Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[59] Jochen J. Steil,et al. Goal Babbling Permits Direct Learning of Inverse Kinematics , 2010, IEEE Transactions on Autonomous Mental Development.

[60] Marco Mirolli,et al. Biological Cumulative Learning through Intrinsic Motivations: A Simulated Robotic Study on the Development of Visually-Guided Reaching , 2010, EpiRob.

[61] Jochen J. Steil,et al. Online Goal Babbling for rapid bootstrapping of inverse models in high dimensions , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[62] Peter Ford Dominey,et al. Robot Cognitive Control with a Neurophysiologically Inspired Reinforcement Learning Model , 2011, Front. Neurorobot..

[63] Gianluca Baldassarre,et al. What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[64] Paolo Tommasino,et al. Reinforcement learning algorithms that assimilate and accommodate skills with multiple tasks , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[65] Kathryn E. Merrick,et al. Intrinsic Motivation and Introspection in Reinforcement Learning , 2012, IEEE Transactions on Autonomous Mental Development.

[66] Marco Mirolli,et al. Intrinsic motivation mechanisms for competence acquisition , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[67] Stefan Schaal,et al. Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[68] Jürgen Schmidhuber,et al. Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams , 2012, Neural Computation.

[69] Marco Mirolli,et al. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study , 2013, Neural Networks.

[70] A. Barto,et al. Novelty or Surprise? , 2013, Front. Psychol..

[71] Martin A. Riedmiller,et al. Modeling effects of intrinsic and extrinsic rewards on the competition between striatal learning systems , 2013, Front. Psychol..

[72] Jürgen Leitner,et al. Learning visual object detection and localisation using icVision , 2013, BICA 2013.

[73] Dumitru Erhan,et al. Deep Neural Networks for Object Detection , 2013, NIPS.

[74] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[75] Marco Mirolli,et al. Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[76] Yu Zhao,et al. Robust active binocular vision through intrinsically motivated learning , 2013, Front. Neurorobot..

[77] Marco Mirolli,et al. Computational and Robotic Models of the Hierarchical Organization of Behavior , 2013, Springer Berlin Heidelberg.

[78] Frank Kirchner,et al. Incremental learning of skill collections based on intrinsic motivation , 2013, Front. Neurorobot..