Learning Hardware Dynamics Model from Experiments for Locomotion Optimization

The hardware compatibility of legged locomotion is often illustrated by Zero Moment Point (ZMP) that has been extensively studied for decades. One of the most popular models for computing the ZMP is the linear inverted pendulum (LIP) model that expresses ZMP as a linear function of the center of mass(COM) and its acceleration. In the real world, however, it may not accurately predict the true ZMP of hardware due to various reasons such as unmodeled dynamics and differences between simulation model and hardware. In this paper, we aim to improve the theoretical ZMP model by learning the real hardware dynamics from experimental data. We first optimize the motion plan using the theoretical ZMP model and collect COP data by executing the motion on a force plate. We then train a new ZMP model that maps the motion plan variable to the actual ZMP and use the learned model for finding a new hardware-compatible motion plan. Through various locomotion tasks of a quadruped, we demonstrate that motions planned for the learned ZMP model are compatible on hardware when those for the theoretical ZMP model are not. Furthermore, experiments using ZMP models with different complexities reveal that overly complex models may suffer from over-fitting even though they can potentially represent more complex, unmodeled dynamics.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[3]  Jun Morimoto,et al.  Improving humanoid locomotive performance with learnt approximated dynamics via Gaussian processes for regression , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[5]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[6]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[7]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[8]  Stefan Schaal,et al.  Learning, planning, and control for quadruped locomotion over challenging terrain , 2011, Int. J. Robotics Res..

[9]  Katsu Yamane,et al.  Practical kinematic and dynamic calibration methods for force-controlled humanoid robots , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[10]  Markus H. Gross,et al.  Interactive design of 3D-printable robotic creatures , 2015, ACM Trans. Graph..

[11]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[12]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[13]  Stephen J. Wright,et al.  Object-oriented software for quadratic programming , 2003, TOMS.

[14]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[15]  Wisama Khalil,et al.  Modeling, Identification and Control of Robots , 2003 .

[16]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[17]  Kazuhito Yokoi,et al.  Biped walking pattern generation by using preview control of zero-moment point , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[18]  Sehoon Ha,et al.  Task-based limb optimization for legged robots , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).