Improving biped walk stability with complementary corrective demonstration

We contribute a method for improving the skill execution performance of a robot by complementing an existing algorithmic solution with corrective human demonstration. We apply the proposed method to the biped walking problem, which is a good example of a complex low level skill due to the complicated dynamics of the walk process in a high dimensional state and action space. We introduce an incremental learning approach to improve the Nao humanoid robot’s stability during walking. First, we identify, extract, and record a complete walk cycle from the motion of the robot as it executes a given walk algorithm as a black box. Second, we apply offline advice operators for improving the stability of the learned open-loop walk cycle. Finally, we present an algorithm to directly modify the recorded walk cycle using real time corrective human demonstration. The demonstrator delivers the corrective feedback using a commercially available wireless game controller without touching the robot. Through the proposed algorithm, the robot learns a closed-loop correction policy for the open-loop walk by mapping the corrective demonstrations to the sensory readings received while walking. Experiment results demonstrate a significant improvement in the walk stability.

[1]  Manuela M. Veloso,et al.  Online ZMP sampling search for biped walking planning , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[3]  Manuela M. Veloso,et al.  Teaching collaborative multi-robot tasks through demonstration , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[4]  Manuela M. Veloso,et al.  Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[5]  Xiaoping Chen,et al.  Simplified Walking: A New Way to Generate Flexible Biped Patterns , 2009 .

[6]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[7]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[8]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[9]  Michael Luck,et al.  AAMAS '03: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems , 2003 .

[10]  Aude Billard,et al.  Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[11]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[12]  Jacky Baltes,et al.  RoboCup 2009: Robot Soccer World Cup XIII [papers from the 13th annual RoboCup International Symposium, Graz, Austria, June 29 - July 5, 2009] , 2010, RoboCup.

[13]  Eric Chown,et al.  Omnidirectional Walking Using ZMP and Preview Control for the NAO Humanoid Robot , 2009, RoboCup.

[14]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[15]  Bernhard Hengst Robocup Standard Platform League , 2014 .

[16]  Andrea Lockerd Thomaz,et al.  Teaching and working with robots as a collaboration , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[17]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[18]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[19]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[20]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Oliver Urbann,et al.  Observer-based dynamic walking control for biped robots , 2009, Robotics Auton. Syst..

[23]  Pierre Blazevic,et al.  Mechatronic design of NAO humanoid , 2009, 2009 IEEE International Conference on Robotics and Automation.

[24]  N Gole,et al.  OBSERVER-BASED DYNAMIC WALKING CONTROL FOR BIPED ROBOTS , 2009 .

[25]  Cynthia Breazeal,et al.  Proceedings of the ACM/IEEE international conference on Human-robot interaction , 2007 .

[26]  T. Röfer,et al.  A Robust Closed-Loop Gait for the Standard Platform League Humanoid , 2009 .

[27]  Gordon Cheng,et al.  Learning Similar Tasks From Observation and Practice , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[29]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[30]  Bariş Gökçe,et al.  Parameter Optimization of a Signal-Based Biped Locomotion Approach Using Evolutionary Strategies , 2009 .