论文信息 - Interplay of Rhythmic and Discrete Manipulation Movements During Development: A Policy-Search Reinforcement-Learning Robot Model

Interplay of Rhythmic and Discrete Manipulation Movements During Development: A Policy-Search Reinforcement-Learning Robot Model

The flexibility of human motor behavior strongly relies on rhythmic and discrete movements. Developmental psychology has shown how these movements closely interplay during development, but the dynamics of that are largely unknown and we currently lack computational models suitable to investigate such interaction. This work initially presents an analysis of the problem from a computational and empirical perspective and then proposes a novel computational model to start to investigate it. The model is based on a movement primitive capable of producing both rhythmic and end-point discrete movements, and on a policy search reinforcement learning algorithm capable of mimicking trial-and-error learning processes underlying development and efficient enough to work on real robots. The model is tested with hand manipulation tasks (“touching,” “tapping,” and “rotating” an object). The results show how the system progressively shapes the initial rhythmic exploration into refined rhythmic or discrete movements depending on the task demand. The tests on the real robot also show how the system exploits the specific hand-object physical properties, some possibly shared with developing infants, to find effective solutions to the tasks. The results show that the model represents a useful tool to investigate the interplay of rhythmic and discrete movements during development.

[1] A. G. Feldman,et al. The origin and use of positional frames of reference in motor control , 1995, Behavioral and Brain Sciences.

[2] R. Shadmehr,et al. Motor disorder in Huntington's disease begins as a dysfunction in error feedback control , 2000, Nature.

[3] Joachim Hoffmann,et al. Exploiting redundancy for flexible behavior: unsupervised learning in a modular sensorimotor control architecture. , 2007, Psychological review.

[4] S. Schaal,et al. Rhythmic arm movement is not discrete , 2004, Nature Neuroscience.

[5] Takashi Maeno,et al. ヒト手指の円筒操り動作パターンとその習熟機構の解析(機械力学,計測,自動制御) , 2002 .

[6] D. Wolpert,et al. Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[7] C. Prablanc,et al. Neural control of on-line guidance of hand reaching movements. , 2003, Progress in brain research.

[8] Gianluca Baldassarre,et al. Planning with neural networks and reinforcement learning , 2001 .

[9] G. Baldassarre,et al. A neural-network reinforcement-learning model of domestic chicks that learn to localize the centre of closed arenas , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[11] W. T. Thach,et al. Basal ganglia motor control. I. Nonexclusive relation of pallidal discharge to five movement modes. , 1991, Journal of neurophysiology.

[12] C. Hofsten. An action perspective on motor development , 2004, Trends in Cognitive Sciences.

[13] Jochen J. Steil,et al. Goal Babbling Permits Direct Learning of Inverse Kinematics , 2010, IEEE Transactions on Autonomous Mental Development.

[14] E. Thelen. Rhythmical stereotypies in normal human infants , 1979, Animal Behaviour.

[15] S Grillner,et al. Central pattern generators for locomotion, with special reference to vertebrates. , 1985, Annual review of neuroscience.

[16] S. Grossberg,et al. A Self-Organizing Neural Model of Motor Equivalent Reaching and Tool Use by a Multijoint Arm , 1993, Journal of Cognitive Neuroscience.

[17] Bruno Siciliano,et al. Modelling and Control of Robot Manipulators , 1997, Advanced Textbooks in Control and Signal Processing.

[18] Jochen J. Steil,et al. Neural learning and dynamical selection of redundant solutions for inverse kinematic control , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[19] A. Barto,et al. Novelty or Surprise? , 2013, Front. Psychol..

[20] Giulio Sandini,et al. The iCub Cognitive Humanoid Robot: An Open-System Research Platform for Enactive Cognition , 2006, 50 Years of Artificial Intelligence.

[21] E. Reed. The Ecological Approach to Visual Perception , 1989 .

[22] Olivier Sigaud,et al. Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning , 2012 .

[23] Stefan Schaal,et al. Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[24] S. Grillner,et al. Intrinsic function of a neuronal network — a vertebrate central pattern generator 1 Published on the World Wide Web on 8 April 1998. 1 , 1998, Brain Research Reviews.

[25] N. Berthier. Learning to reach: A mathematical model. , 1996 .

[26] D. Parisi,et al. Integrating reinforcement learning, equilibrium points, and minimum variance to understand the development of reaching: a computational model. , 2014, Psychological review.

[27] Angelo Cangelosi,et al. An open-source simulator for cognitive robotics research: the prototype of the iCub humanoid robot simulator , 2008, PerMIS.

[28] A. Cangelosi,et al. How affordances associated with a distractor object affect compatibility effects: A study with the computational model TRoPICALS , 2013, Psychological research.

[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30] Neil E. Berthier,et al. The Syntax of Human Infant Reaching , 2022 .

[31] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[32] Philippe Gorce,et al. A neural network architecture to learn arm motion planning in grasping tasks with obstacle avoidance , 2005, Robotica.

[33] S. Schaal. The Computational Neurobiology of Reaching and Pointing — A Foundation for Motor Learning by Reza Shadmehr and Steven P. Wise , 2007 .

[34] C. Hofsten. Eye–hand coordination in the newborn. , 1982 .

[35] Raymond J. Dolan,et al. Keep focussing: striatal dopamine multiple functions resolved in a single mechanism tested in a simulated humanoid robot , 2014, Front. Psychol..

[36] Eugenio Guglielmelli,et al. The Role of Learning and Kinematic Features in Dexterous Manipulation: A Comparative Study with Two Robotic Hands , 2013 .

[37] R. Ivry,et al. The neural representation of time , 2004, Current Opinion in Neurobiology.

[38] Dirk Kraft,et al. A Survey of the Ontogeny of Tool Use: From Sensorimotor Experience to Planning , 2013, IEEE Transactions on Autonomous Mental Development.

[39] Eugenio Guglielmelli,et al. A reinforcement learning model of reaching integrating kinematic and dynamic control in a simulated arm robot , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[40] S. Tsujimoto. The Prefrontal Cortex: Functional Neural Development During Early Childhood , 2008, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[41] N. Hogan,et al. On rhythmic and discrete movements: reflections, definitions and implications for motor control , 2007, Experimental Brain Research.

[42] Sadri Hassani,et al. Nonlinear Dynamics and Chaos , 2000 .

[43] R. Passingham,et al. The Neurobiology of the Prefrontal Cortex: Anatomy, Evolution, and the Origin of Insight , 2012 .

[44] Peter Redgrave,et al. A computational model of action selection in the basal ganglia. I. A new functional anatomy , 2001, Biological Cybernetics.

[45] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[46] Francesco Mannella,et al. Intrinsically motivated action-outcome learning and goal-based action recall: a system-level bio-constrained computational model. , 2013, Neural networks : the official journal of the International Neural Network Society.

[47] Marco Mirolli,et al. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study , 2013, Neural Networks.

[48] Christian Balkenius,et al. Integrating Epistemic Action (Active Vision) and Pragmatic Action (Reaching): A Neural Architecture for Camera-Arm Robots , 2008, SAB.

[49] Daniele Caligiore,et al. AFFORDANCES AND COMPATIBILITY EFFECTS: A NEURAL-NETWORK COMPUTATIONAL MODEL , 2009 .

[50] Ludovic Righetti,et al. Toward simple control for complex, autonomous robotic applications: combining discrete and rhythmic motor primitives , 2011, Auton. Robots.

[51] Francesco Mannella,et al. Selection of cortical dynamics for motor behaviour by the basal ganglia , 2015, Biological Cybernetics.

[52] A. Barto,et al. Approximate optimal control as a model for motor learning. , 2005, Psychological review.

[53] M. MacKay-Lyons. Central pattern generation of locomotion: a review of the evidence. , 2002, Physical therapy.

[54] Angelo Rega,et al. A Model of Reaching that Integrates Reinforcement Learning and Population Encoding of Postures , 2006, SAB.

[55] M. Graziano,et al. Complex Movements Evoked by Microstimulation of Precentral Cortex , 2002, Neuron.

[56] Dagmar Sternad,et al. Towards a Unified Theory of Rhythmic and Discrete Movements — Behavioral, Modeling and Imaging Results , 2008 .

[57] Stefan Schaal,et al. Encoding of periodic and their transient motions by a single dynamic movement primitive , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[58] Domenico Parisi,et al. A Bioinspired Hierarchical Reinforcement Learning Architecture for Modeling Learning of Multiple Skills with Continuous States and Actions , 2010, EpiRob.

[59] R. Keen. The development of problem solving in young children: a critical cognitive skill. , 2011, Annual review of psychology.

[60] H. Yin,et al. The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[61] Domenico Formica,et al. A mechatronic platform for behavioral analysis on nonhuman primates. , 2012, Journal of integrative neuroscience.

[62] H. Zelaznik,et al. The Cerebellum and Event Timing , 2002, Annals of the New York Academy of Sciences.

[63] Bruno Castro da Silva,et al. Learning parameterized motor skills on a humanoid robot , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[64] Jun Morimoto,et al. Reinforcement learning with via-point representation , 2004, Neural Networks.

[65] Marco Mirolli,et al. Which is the best intrinsic motivation signal for learning multiple skills? , 2013, Front. Neurorobot..

[66] C. Rovee-Collier,et al. Reactivation of infant memory. , 1980, Science.

[67] Stefan Schaal,et al. Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[68] E. Thelen. Kicking, rocking, and waving: Contextual analysis of rhythmical stereotypies in normal human infants , 1981, Animal Behaviour.

[69] S. Grillner,et al. Neuronal Control of LocomotionFrom Mollusc to Man , 1999 .

[70] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[71] Herbert Jaeger,et al. Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[72] Jeffrey J Lockman,et al. Infant manual exploration of composite substrates. , 2007, Journal of experimental child psychology.

[73] Auke Jan Ijspeert,et al. Central pattern generators for locomotion control in animals and robots: A review , 2008, Neural Networks.

[74] C. K. Rovee,et al. Conjugate reinforcement of infant exploratory behavior. , 1969, Journal of experimental child psychology.

[75] G. Pezzulo,et al. Neuroscience and Biobehavioral Reviews the Contribution of Brain Sub-cortical Loops in the Expression and Acquisition of Action Understanding Abilities , 2022 .

[76] G. E. Alexander,et al. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. , 1986, Annual review of neuroscience.

[77] Minoru Asada,et al. Cognitive developmental robotics as a new paradigm for the design of humanoid robots , 2001, Robotics Auton. Syst..

[78] Marco Mirolli,et al. Computational and Robotic Models of the Hierarchical Organization of Behavior , 2013, Springer Berlin Heidelberg.

[79] Jochen J. Steil,et al. Reaching movement generation with a recurrent neural network based on learning inverse kinematics for the humanoid robot iCub , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[80] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[81] Gianluca Baldassarre,et al. What are intrinsic motivations? A biological perspective , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[82] Shinya Kotosaka,et al. Submitted to: IEEE International Conference on Humanoid Robotics Nonlinear Dynamical Systems as Movement Primitives , 2022 .

[83] G. Baldassarre,et al. Learning to select targets within targets in reaching tasks , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[84] Jochen Triesch,et al. A bio-inspired attention model of anticipation in gaze-contingency experiments with infants , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[85] E. Gibson,et al. An Ecological Approach to Perceptual Learning and Development , 2000 .

[86] S. Kastner,et al. Complex organization of human primary motor cortex: a high-resolution fMRI study. , 2008, Journal of neurophysiology.

[87] Marco Mirolli,et al. Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[88] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[89] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[90] Giulio Sandini,et al. Developmental robotics: a survey , 2003, Connect. Sci..

[91] Paolo Tommasino,et al. Modular and hierarchical brain organization to understand assimilation, accommodation and their relation to autism in reaching tasks: a developmental robotics hypothesis , 2014, Adapt. Behav..

[92] Dagmar Sternad,et al. Transitions between discrete and rhythmic primitives in a unimanual task , 2013, Front. Comput. Neurosci..

[93] D. Parisi,et al. TRoPICALS: a computational embodied neuroscience model of compatibility effects. , 2010, Psychological review.

[94] Daniel Bullock,et al. Chapter 11 Vite and Flete: Neural Modules for Trajectory Formation and Postural Control , 1989 .

[95] P. L. Adams. THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[96] A. Cangelosi,et al. Developmental Robotics: From Babies to Robots , 2015 .

[97] Olivier Sigaud,et al. Multiple task optimization using dynamical movement primitives for whole-body reactive control , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[98] Angelo Cangelosi,et al. Modelling mental rotation in cognitive robots , 2013, Adapt. Behav..

[99] E. Zehr,et al. Possible contributions of CPG activity to the control of rhythmic human arm movement. , 2004, Canadian journal of physiology and pharmacology.

[100] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[101] Loredana Zollo,et al. Hierarchical reinforcement learning and central pattern generators for modeling the development of rhythmic manipulation skills , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[102] J C Houk,et al. Action selection and refinement in subcortical loops through basal ganglia and cerebellum , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[103] N. Berthier,et al. Development of reaching in infancy , 2006, Experimental Brain Research.

[104] Daniela Corbetta,et al. Mapping the feel of the arm with the sight of the object: on the embodied origins of infant reaching , 2014, Front. Psychol..

[105] Domenico Parisi,et al. Using Motor Babbling and Hebb Rules for Modeling the Development of Reaching with Obstacles and Grasping , 2008 .

[106] M Kuperstein,et al. Neural model of adaptive hand-eye coordination for single postures. , 1988, Science.

[107] Tsukasa Ogasawara,et al. CPG-based manipulation: generation of rhythmic finger gaits from human observation , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[108] D. Sternad,et al. Interaction between discrete and rhythmic movements: reaction time and phase of discrete movement initiation during oscillatory movements , 2003, Brain Research.

[109] E. Visalberghi,et al. Exploration and learning in capuchin monkeys (Sapajus spp.): the role of action–outcome contingencies , 2014, Animal Cognition.

[110] Kiyotoshi Matsuoka,et al. Sustained oscillations generated by mutually inhibiting neurons with adaptation , 1985, Biological Cybernetics.

[111] Ralf Der,et al. From Motor Babbling to Purposive Actions: Emerging Self-exploration in a Dynamical Systems Approach to Early Robot Development , 2006, SAB.

[112] A. Ijspeert,et al. From Swimming to Walking with a Salamander Robot Driven by a Spinal Cord Model , 2007, Science.

[113] M. Arbib,et al. Infant grasp learning: a computational model , 2004, Experimental Brain Research.

[114] Dimitri Ognibene,et al. Ecological Active Vision: Four Bioinspired Principles to Integrate Bottom–Up and Adaptive Top–Down Attention Tested With a Simple Camera-Arm Robot , 2015, IEEE Transactions on Autonomous Mental Development.

[115] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[116] T. Flash,et al. The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[117] Paolo Dario,et al. A bio-inspired predictive sensory-motor coordination scheme for robot reaching and preshaping , 2008, Auton. Robots.

[118] G. Schöner,et al. A dynamic theory of coordination of discrete movement , 1990, Biological Cybernetics.

[119] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[120] Jeffrey L. Krichmar,et al. Evolutionary robotics: The biology, intelligence, and technology of self-organizing machines , 2001, Complex..

[121] Olivier Sigaud,et al. Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[122] Gianluca Baldassarre,et al. Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot , 2003, ABiALS.

[123] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[124] Jivko Sinapov,et al. Learning to press doorbell buttons , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[125] Olivier Sigaud,et al. Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.