Programming by Visual Demonstration for Pick-and-Place Tasks using Robot Skills

In this paper, we present a vision-based robot programming system for pick-and-place tasks that can generate programs from human demonstrations. The system consists of a detection network and a program generation module. The detection network leverages convolutional pose machines to detect the key-points of the objects. The network is trained in a simulation environment in which the train set is collected and auto-labeled. To bridge the gap between reality and simulation, we propose a design method of transform function for mapping a real image to synthesized style. Compared with the unmapped results, the Mean Absolute Error (MAE) of the model completely trained with synthesized images is reduced by 23% and the False Negative Rate FNR (FNR) of the model fine-tuned by the real images is reduced by 42.5% after mapping. The program generation module provides a human-readable program based on the detection results to reproduce a real-world demonstration, in which a longshort memory (LSM) is designed to integrate current and historical information. The system is tested in the real world with a UR5 robot on the task of stacking colored cubes in different orders.

[1]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[3]  Alberto Montebelli,et al.  Incrementally assisted kinesthetic teaching for programming by demonstration , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Stephen Tyree,et al.  Synthetically Trained Neural Networks for Learning Human-Readable Plans from Real-World Demonstrations , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Elin Anna Topp,et al.  From Demonstrations to Skills for High-Level Programming of Industrial Robots , 2016, AAAI Fall Symposia.

[6]  Ole Madsen,et al.  Robot skills for manufacturing , 2016 .

[7]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[8]  Tadej Petric,et al.  Human-in-the-loop approach for teaching robot assembly tasks using impedance control interface , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Volker Krüger,et al.  On the Integration of Hardware-Abstracted Robot Skills for use in Industrial Scenarios , 2013 .

[11]  Michael F. Zäh,et al.  Interactive laser-projection for programming industrial robots , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[12]  Stephen James,et al.  3D Simulation for Robot Arm Control with Deep Q-Learning , 2016, ArXiv.

[13]  Hsien-I Lin,et al.  Understanding Human Hand Gestures for Learning Robot Pick-and-Place Tasks , 2015 .

[14]  Jörg Krüger,et al.  Robust finger gesture recognition on handheld devices for spatial programming of industrial robots , 2013, 2013 IEEE RO-MAN.

[15]  Hsien-I Lin,et al.  A Novel Teaching System for Industrial Robots , 2014, Sensors.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[18]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.