Object Detection-Based One-Shot Imitation Learning with an RGB-D Camera

End-to-end robot learning has achieved a great success for robots to obtain various manipulation skills. It learns a function which maps visual information to robotic action directly. Because of the diversity of target objects, most end-to-end robot learning approaches have focused on a single object-specific task with a limited capability of generalization. In this work, an object detection-based one-shot learning method is proposed, which separates the semantic understanding from robot control. It enables a robot to acquire similar manipulation skills efficiently and to have the ability to cope with new objects with a single demonstration. This approach mainly has two modules: the object detection network and the motion policy network. With RGB images, the object detection network tries to output the task-related semantic keypoint of the target object, which is the center of the container in this application, and the motion policy network generates the motion action based on the depth map and the detected keypoint. To evaluate this proposed pipeline, a series of experiments are conducted on typical placing tasks in different simulation scenarios and, additionally, the learned policy is transferred from simulation to the real world without any fine-tuning.

[1]  Silvio Savarese,et al.  6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[3]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[4]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Martin A. Riedmiller,et al.  Acquiring visual servoing reaching and grasping skills using neural reinforcement learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[6]  Weiwei Shang,et al.  Transfer of Robot Perception Module With Adversarial Learning , 2019, IEEE Access.

[7]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Ladislau Bölöni,et al.  Pay Attention! - Robustifying a Deep Visuomotor Policy Through Task-Focused Visual Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[10]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[11]  Jin Ma,et al.  Composable Instructions and Prospection Guided Visuomotor Control for Robotic Manipulation , 2019, Int. J. Comput. Intell. Syst..

[12]  Wei Gao,et al.  kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[13]  Hao Su,et al.  S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes , 2019, CoRL.

[14]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[15]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Alexei A. Efros,et al.  Few-Shot Segmentation Propagation with Guided Networks , 2018, ArXiv.

[17]  Wolfram Burgard,et al.  Socially Compliant Navigation Through Raw Depth Inputs with Generative Adversarial Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[19]  Xinjun Sheng,et al.  Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis , 2019, ArXiv.

[20]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[22]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[23]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Rouhollah Rahmatizadeh,et al.  Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[26]  Patric Jensfelt,et al.  Adversarial Feature Training for Generalizable Robotic Visuomotor Control , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Jürgen Leitner,et al.  Learning robust, real-time, reactive robotic grasping , 2019, Int. J. Robotics Res..

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  John Folkesson,et al.  Deep Reinforcement Learning to Acquire Navigation Skills for Wheel-Legged Robots in Complex Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Sergey Levine,et al.  Deep Object-Centric Representations for Generalizable Robot Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Andrew J. Davison,et al.  Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[33]  Ville Kyrki,et al.  Affordance Learning for End-to-End Visuomotor Robot Control , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[35]  Shiguo Lian,et al.  Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review , 2019, ArXiv.