Transferring impedance control strategies between heterogeneous systems via apprenticeship learning

We present a novel method for designing controllers for robots with variable impedance actuators. We take an imitation learning approach, whereby we learn impedance modulation strategies from observations of behaviour (for example, that of humans) and transfer these to a robotic plant with very different actuators and dynamics. In contrast to previous approaches where impedance characteristics are directly imitated, our method uses task performance as the metric of imitation, ensuring that the learnt controllers are directly optimised for the hardware of the imitator. As a key ingredient, we use apprenticeship learning to model the optimisation criteria underlying observed behaviour, in order to frame a correspondent optimal control problem for the imitator. We then apply local optimal feedback control techniques to find an appropriate impedance modulation strategy under the imitator's dynamics. We test our approach on systems of varying complexity, including a novel, antagonistic series elastic actuator and a biologically realistic two-joint, six-muscle model of the human arm.

[1]  M. Kawato,et al.  Virtual trajectory and stiffness ellipse during multijoint arm movement predicted by neural inverse models , 2005, Biological Cybernetics.

[2]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[3]  Rieko Osu,et al.  Endpoint Stiffness of the Arm Is Directionally Tuned to Instability in the Environment , 2007, The Journal of Neuroscience.

[4]  Chrystopher L. Nehaniv,et al.  Correspondence Mapping Induced State and Action Metrics for Robotic Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Sethu Vijayakumar,et al.  Exploiting sensorimotor stochasticity for learning control of variable impedance actuators , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[6]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation: Part I—Theory , 1985 .

[7]  Bram Vanderborght,et al.  MACCEPA, the mechanically adjustable compliance and controllable equilibrium position actuator: Design and implementation in a biped robot , 2007, Robotics Auton. Syst..

[8]  Daniel M. Wolpert,et al.  Signal-dependent noise determines motor planning , 1998, Nature.

[9]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10]  Blake Hannaford,et al.  McKibben artificial muscles: pneumatic actuators with biomechanical intelligence , 1999, 1999 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (Cat. No.99TH8399).

[11]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation: Part III—Applications , 1985 .

[12]  Giorgio Grioli,et al.  VSA-II: a novel prototype of variable stiffness actuator for safe and performing robots interacting with humans , 2008, 2008 IEEE International Conference on Robotics and Automation.

[13]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..