Generalization of Force Control Policies from Demonstrations for Constrained Robotic Motion Tasks

Although learning of control policies from demonstrations has been thoroughly investigated in the literature, generalization of policies to new contexts still remains a challenge given that existing approaches exhibit limited performance when generalizing to new tasks. In this article, we propose two policy generalization approaches employed for generalizing motion-based force control policies with the view of performing constrained motions in presence of motion-dependent external forces. The key concept of the proposed methods is using, apart from policy values, also policy derivatives or differences which express how the policy varies with respect to variations in its input and combine these two kinds of information to generalize the policy at new inputs. The first proposed approach learns policy and policy derivative values by linear regression and combines these data into a first-order Taylor-like polynomial to estimate the policy at new inputs. The second approach learns policy and policy difference data by locally weighted regression and combines them in a superposition fashion to estimate the policy at new inputs. The policy differences in this approach represent variations of the policy in the direction of minimizing the distance between the new incoming and average-demonstrated inputs. The proposed approaches are evaluated in real-world robot constrained motion tasks by using a linear-actuated, two degrees-of-freedom haptic device.

[1]  Darwin G. Caldwell,et al.  Learning force and position constraints in human-robot cooperative transportation , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[3]  Stefan Schaal,et al.  Learning Control in Robotics , 2010, IEEE Robotics & Automation Magazine.

[4]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[5]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[6]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Rieko Osu,et al.  The central nervous system stabilizes unstable dynamics by learning optimal impedance , 2001, Nature.

[8]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Bernhard Schölkopf,et al.  Learning Inverse Dynamics: a Comparison , 2008, ESANN.

[10]  Yoshihiko Nakamura,et al.  Mimetic communication with impedance control for physical human-robot interaction , 2009, 2009 IEEE International Conference on Robotics and Automation.

[11]  Stefan Schaal,et al.  Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[12]  Tien C. Hsia,et al.  Nonhomogeneous material milling using a robot manipulator with force controlled velocity , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[13]  J. Kocijan,et al.  Derivative observations used in predictive control , 2004, Proceedings of the 12th IEEE Mediterranean Electrotechnical Conference (IEEE Cat. No.04CH37521).

[14]  Dongheui Lee,et al.  Incremental kinesthetic teaching of motion primitives using the motion refinement tube , 2011, Auton. Robots.

[15]  Dana Kulic,et al.  Online Incremental Learning of Inverse Dynamics Incorporating Prior Knowledge , 2011, AIS.

[16]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[17]  Iven M. Y. Mareels,et al.  Stability and motor adaptation in human arm movements , 2005, Biological Cybernetics.

[18]  Alin Albu-Schäffer,et al.  A versatile biomimetic controller for contact tooling and haptic exploration , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[20]  Carme Torras,et al.  A robot learning from demonstration framework to perform force-based manipulation tasks , 2013, Intelligent Service Robotics.

[21]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[22]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[23]  Sethu Vijayakumar,et al.  A novel method for learning policies from variable constraint data , 2009, Auton. Robots.

[24]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  Carme Torras,et al.  Learning Collaborative Impedance-Based Robot Behaviors , 2013, AAAI.

[26]  Angelika Peer,et al.  Imitation learning of human grasping skills from motion and force data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Gaurav S. Sukhatme,et al.  An autonomous manipulation system based on force control and optimization , 2014, Auton. Robots.

[28]  Stefan Schaal,et al.  Reinforcement learning of impedance control in stochastic force fields , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[29]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[30]  Dongheui Lee,et al.  Incremental motion primitive learning by physical coaching using impedance control , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Beno Benhabib,et al.  Stiffness optimization for two-armed robotic sculpting , 2008, Ind. Robot.

[32]  Sandra Hirche,et al.  Learning and generalizing force control policies for sculpting , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[34]  Andrej Gams,et al.  Coupling Movement Primitives: Interaction With the Environment and Bimanual Tasks , 2014, IEEE Transactions on Robotics.

[35]  Sandra Hirche,et al.  Learning interaction control policies by demonstration , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[37]  Darwin G. Caldwell,et al.  Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input , 2011, Adv. Robotics.

[38]  Azwirman Gusrialdi,et al.  ESC-MRAC of MIMO systems for constrained robotic motion tasks in deformable environments , 2014, 2014 European Control Conference (ECC).

[39]  Peter Kazanzides,et al.  Force sensing and control for a surgical robot , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[40]  Carme Torras,et al.  Sharpening haptic inputs for teaching a manipulation skill to a robot , 2010 .

[41]  Sandra Hirche,et al.  Risk-Sensitive Optimal Feedback Control for Haptic Assistance , 2012, 2012 IEEE International Conference on Robotics and Automation.

[42]  Sandra Hirche,et al.  Disagreement-aware physical assistance through risk-sensitive optimal feedback control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Gerald J. Hahn,et al.  The Hazards of Extrapolation in Regression Analysis , 1977 .

[44]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[45]  A. Stoytchev,et al.  Interactive Identification of Writing Instruments and Writable Surfaces by a Robot , 2009 .