Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning

The democratization of robotics technology and the development of new actuators progressively bring robots closer to humans. The applications that can now be envisaged drastically contrast with the requirements of industrial robots. In standard manufacturing settings, the criterions used to assess performance are usually related to the robot's accuracy, repeatability, speed or stiffness. Learning a control policy to actuate such robots is characterized by the search of a single solution for the task, with a representation of the policy consisting of moving the robot through a set of points to follow a trajectory. With new environments such as homes and offices populated with humans, the reproduction performance is portrayed differently. These robots are expected to acquire rich motor skills that can be generalized to new situations, while behaving safely in the vicinity of users. Skills acquisition can no longer be guided by a single form of learning, and must instead combine different approaches to continuously create, adapt and refine policies. The family of search strategies based on expectation-maximization (EM) looks particularly promising to cope with these new requirements. The exploration can be performed directly in the policy parameters space, by refining the policy together with exploration parameters represented in the form of covariances. With this formulation, RL can be extended to a multi-optima search problem in which several policy alternatives can be considered. We present here two applications exploiting EM-based exploration strategies, by considering parameterized policies based on dynamical systems, and by using Gaussian mixture models for the search of multiple policy alternatives.

[1]  W. Wong,et al.  On ψ-Learning , 2003 .

[2]  Anthony Jarc,et al.  Simplified and effective motor control based on muscle synergies to exploit musculoskeletal dynamics , 2009, Proceedings of the National Academy of Sciences.

[4]  Marin Kobilarov,et al.  Cross-entropy motion planning , 2012, Int. J. Robotics Res..

[5]  Massimo Bergamasco,et al.  Digital representation of skills for human-robot interaction , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[6]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[7]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[8]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[9]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[10]  Aude Billard,et al.  Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[11]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[12]  J. Peters,et al.  Using Reward-weighted Regression for Reinforcement Learning of Task Space Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[13]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[14]  Thomas Hulin,et al.  Evaluating exemplary training accelerators for Programming-by-Demonstration , 2010, 19th International Symposium in Robot and Human Interactive Communication.

[15]  김병찬,et al.  Impedance learning for Robotic Contact Tasks using Natural Actor-Critic Algorithm , 2010 .

[16]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[17]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[18]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[19]  Jan Peters,et al.  Imitation and Reinforcement Learning: Practical Algorithms for Motor Primitives in Robotics , 2010 .

[20]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[21]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Jindrich Kodl,et al.  The CNS Stochastically Selects Motor Plan Utilizing Extrinsic and Intrinsic Representations , 2011, PloS one.

[23]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[24]  Darwin G. Caldwell,et al.  Handling of multiple constraints and motion alternatives in a robot programming by demonstration framework , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[25]  M. Latash,et al.  Motor Control Strategies Revealed in the Structure of Motor Variability , 2002, Exercise and sport sciences reviews.

[26]  Michael T. Rosenstein,et al.  Learning at the level of synergies for a robot weightlifter , 2006, Robotics Auton. Syst..

[27]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[28]  A. Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[29]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[30]  Aude Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[31]  Alin Albu-Schäffer,et al.  Biomimetic motor behavior for simultaneous adaptation of force, impedance and trajectory in interaction tasks , 2010, 2010 IEEE International Conference on Robotics and Automation.

[32]  Andreas Daffertshofer,et al.  The evolution of coordination during skill acquisition: The dynamical systems approach , 2004 .

[33]  Paul R. Schrater,et al.  Within- and Cross-Modal Distance Information Disambiguate Visual Size-Change Perception , 2010, PLoS Comput. Biol..

[34]  Ferdinando A. Mussa-Ivaldi,et al.  From basis functions to basis fields: vector field approximation from sparse data , 1992, Biological Cybernetics.

[35]  Dagmar Sternad,et al.  Coordinate Dependence of Variability Analysis , 2010, PLoS Comput. Biol..

[36]  Andreu Català,et al.  Emerging motor behaviors: Learning joint coordination in articulated mobile robots , 2009, Neurocomputing.

[37]  D. Sternad,et al.  Decomposition of variability in the execution of goal-oriented tasks: three components of skill improvement. , 2004, Journal of experimental psychology. Human perception and performance.

[38]  Zhihua Zhang,et al.  EM algorithms for Gaussian mixtures with split-and-merge operation , 2003, Pattern Recognit..

[39]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[40]  J. Foley The co-ordination and regulation of movements , 1968 .

[41]  T. Flash,et al.  The control of hand equilibrium trajectories in multi-joint arm movements , 1987, Biological Cybernetics.

[42]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[43]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[44]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[45]  Stefan Schaal,et al.  Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance , 2009, 2009 IEEE International Conference on Robotics and Automation.

[46]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[47]  Jun Nakanishi,et al.  Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[48]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[49]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[50]  D. Kerwin,et al.  Elite sprinting: are athletes individually step-frequency or step-length reliant? , 2011, Medicine and science in sports and exercise.

[51]  Takefumi Kikusui,et al.  A Role for Strain Differences in Waveforms of Ultrasonic Vocalizations during Male–Female Interaction , 2011, PloS one.

[52]  Darwin G. Caldwell,et al.  Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.