Supervised Learning and Reinforcement Learning of Feedback Models for Reactive Behaviors: Tactile Feedback Testbed

Robots need to be able to adapt to unexpected changes in the environment such that they can autonomously succeed in their tasks. However, hand-designing feedback models for adaptation is tedious, if at all possible, making data-driven methods a promising alternative. In this paper we introduce a full framework for learning feedback models for reactive motion planning. Our pipeline starts by segmenting demonstrations of a complete task into motion primitives via a semi-automated segmentation algorithm. Then, given additional demonstrations of successful adaptation behaviors, we learn initial feedback models through learning from demonstrations. In the final phase, a sample-efficient reinforcement learning algorithm fine-tunes these feedback models for novel task settings through few real system interactions. We evaluate our approach on a real anthropomorphic robot in learning a tactile feedback task.

[1]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Aude Billard,et al.  A dynamical system approach to realtime obstacle avoidance , 2012, Autonomous Robots.

[3]  Andrej Gams,et al.  Coupling Movement Primitives: Interaction With the Environment and Bimanual Tasks , 2014, IEEE Transactions on Robotics.

[4]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Stefan Schaal,et al.  Real-Time Perception Meets Reactive Motion Generation , 2017, IEEE Robotics and Automation Letters.

[6]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[8]  Ales Ude,et al.  Adaptation of bimanual assembly tasks using iterative learning framework , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[9]  Stefan Schaal,et al.  Online movement adaptation based on previous sensor experiences , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Benoni Edin,et al.  Grip Stabilization through Independent Finger Tactile Feedback Control , 2020, Sensors.

[11]  Stefan Schaal,et al.  Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[12]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[13]  Daniel Kappler,et al.  Riemannian Motion Policies , 2018, ArXiv.

[14]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[15]  Jimmy A. Jørgensen,et al.  Adaptation of manipulation skills in physical contact with the environment to reference force profiles , 2015, Auton. Robots.

[16]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[17]  Byron Boots,et al.  Continuous-time Gaussian process motion planning via probabilistic inference , 2017, Int. J. Robotics Res..

[18]  Siddhartha S. Srinivasa,et al.  Space-time functional gradient optimization for motion planning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Minoru Asada,et al.  Skill Memories for Parameterized Dynamic Action Primitives on the Pneumatically Driven Humanoid Robot Child Affetto , 2018, 2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[20]  Stefan Schaal,et al.  Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Jun Morimoto,et al.  Orientation in Cartesian space dynamic movement primitives , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[23]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[24]  Stefan Schaal,et al.  From dynamic movement primitives to associative skill memories , 2013, Robotics Auton. Syst..

[25]  Andrej Gams,et al.  On-line learning and modulation of periodic movements with nonlinear dynamical systems , 2009, Auton. Robots.

[26]  Chris Bishop,et al.  Improving the Generalization Properties of Radial Basis Function Neural Networks , 1991, Neural Computation.

[27]  Byron Boots,et al.  Robust Learning of Tactile Force Estimation through Robot Interaction , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Stefan Schaal,et al.  Learning feedback terms for reactive planning and control , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Heiko Hoffmann,et al.  Adaptive robotic tool use under variable grasps , 2014, Robotics Auton. Syst..

[30]  Marc Toussaint,et al.  Understanding the geometry of workspace obstacles in Motion Optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Jens Kober,et al.  Reinforcement learning of motor skills using Policy Search and human corrective advice , 2019, Int. J. Robotics Res..

[32]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[33]  Sergey Levine,et al.  Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Sergey Levine,et al.  Manipulation by Feel: Touch-Based Control with Deep Predictive Models , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[35]  Scott Niekum,et al.  Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[36]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[37]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[38]  Gaurav S. Sukhatme,et al.  Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[40]  Maja J. Mataric,et al.  Automated Derivation of Primitives for Movement Classification , 2000, Auton. Robots.

[41]  Michael N. Mistry,et al.  Learning Generalizable Coupling Terms for Obstacle Avoidance via Low-Dimensional Geometric Descriptors , 2019, IEEE Robotics and Automation Letters.

[42]  Stefan Schaal,et al.  Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance , 2009, 2009 IEEE International Conference on Robotics and Automation.

[43]  Andrej Gams,et al.  Generalization of orientational motion in unit quaternion space , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[44]  Stefan Schaal,et al.  Learning coupling terms for obstacle avoidance , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[45]  Oliver Kroemer,et al.  Learning robot tactile sensing for object manipulation , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Alberto Rodriguez,et al.  Tactile Dexterity: Manipulation Primitives with Tactile Feedback , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Jan Peters,et al.  Probabilistic segmentation applied to an assembly task , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[48]  Ales Ude,et al.  Action sequencing using dynamic movement primitives , 2011, Robotica.

[49]  John Kenneth Salisbury,et al.  Learning to represent haptic feedback for partially-observable tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Yevgen Chebotar,et al.  Learning Latent Space Dynamics for Tactile Servoing , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[51]  Jan Peters,et al.  Learning motor skills from partially observed movements executed at different speeds , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[52]  Stefan Schaal,et al.  Learning Policy Improvements with Path Integrals , 2010, AISTATS.

[53]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[54]  A.G. Alleyne,et al.  A survey of iterative learning control , 2006, IEEE Control Systems.

[55]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[56]  Andrej Gams,et al.  Learning and adaptation of periodic motion primitives based on force feedback and human coaching interaction , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[57]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[58]  Jianhua Li,et al.  GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing Finger , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[59]  Ville Kyrki,et al.  Reinforcement learning for improving imitated in-contact skills , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[60]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[61]  Oliver Brock,et al.  Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems , 2016, IJCAI.

[62]  Andrej Gams,et al.  Learning of parametric coupling terms for robot-environment interaction , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[63]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[64]  Byron Boots,et al.  RMPflow: A Computational Graph for Automatic Motion Policy Generation , 2018, WAFR.

[65]  Wee Siong Ng,et al.  Traj Align: A Method for Precise Matching of 3-D Trajectories , 2010, 2010 20th International Conference on Pattern Recognition.

[66]  Jan Peters,et al.  Using probabilistic movement primitives in robotics , 2017, Autonomous Robots.

[67]  Veronica J. Santos,et al.  Biomimetic Tactile Sensor Array , 2008, Adv. Robotics.

[68]  Scott Niekum,et al.  Learning grounded finite-state representations from unstructured demonstrations , 2015, Int. J. Robotics Res..

[69]  Chonhyon Park,et al.  ITOMP: Incremental Trajectory Optimization for Real-Time Replanning in Dynamic Environments , 2012, ICAPS.

[70]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[71]  Oussama Khatib,et al.  Adaptive human-inspired compliant contact primitives to perform surface–surface contact under uncertainty , 2016, Int. J. Robotics Res..

[72]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[73]  Stefan Schaal,et al.  Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[74]  Stefan Schaal,et al.  Movement segmentation using a primitive library , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.