Learning Affordance Space in Physical World for Vision-based Robotic Object Manipulation

What is a proper representation for objects in manipulation? What would human try to perceive when manipulating a new object in a new environment? In fact, instead of focusing on the texture and illumination, human can infer the "affordance" [36] of the objects from vision. Here "affordance" describes the object's intrinsic property that affords a particular type of manipulation. In this work, we investigate whether such affordance can be learned by a deep neural network. In particular, we propose an Affordance Space Perception Network (ASPN) that takes an image as input and outputs an affordance map. Different from existing works that infer the pixel-wise probability affordance map in image space, our affordance is defined in the real world space, thus eliminates the need of hand-eye calibration. In addition, we extend the representation ability of affordance by defining it in a 3D affordance space and propose a novel training strategy to improve the performance. Trained purely with simulation data, ASPN can achieve significant performance in the real world. It is a task-agnostic framework and can handle different objects, scenes and viewpoints. Extensive real-world experiments demonstrate the accuracy and robustness of our approach. We achieve the success rates of 94.2% for singular-object pushing and 92.4% for multiple-object pushing. We also achieve the success rates of 97.2% for singular-object grasping and 95.4% for multiple-object grasping, which outperform current state-of-the-art methods.

[1]  Xinyu Liu,et al.  Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning , 2017, ArXiv.

[2]  James J. Gibson,et al.  The Ecological Approach to Visual Perception: Classic Edition , 2014 .

[3]  Surya P. N. Singh,et al.  V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  R. Brooks Planning Collision- Free Motions for Pick-and-Place Operations , 1983 .

[5]  Pieter Abbeel,et al.  Learning Robotic Assembly from CAD , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[7]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Atabak Dehban,et al.  A deep probabilistic framework for heterogeneous self-supervised learning of affordances , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[9]  Stefan Leutenegger,et al.  Deep learning a grasp function for grasping under gripper pose uncertainty , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[11]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[12]  Oliver Kroemer,et al.  Learning Continuous Grasp Affordances by Sensorimotor Exploration , 2010, From Motor Learning to Interaction Learning in Robots.

[13]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[15]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[16]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[17]  Oliver Kroemer,et al.  Learning grasp affordance densities , 2011, Paladyn J. Behav. Robotics.

[18]  Sergey Levine,et al.  GPLAC: Generalizing Vision-Based Robotic Skills Using Weakly Labeled Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20]  Eduardo F. Morales,et al.  Enhancing object, action, and effect recognition using probabilistic affordances , 2019, Adapt. Behav..

[21]  Olivier Sigaud,et al.  From Motor Learning to Interaction Learning in Robots , 2010, From Motor Learning to Interaction Learning in Robots.

[22]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[23]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[24]  Hui Cheng,et al.  MetaGrasp: Data Efficient Grasping by Affordance Interpreter Network , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  Van-Duc Nguyen,et al.  Constructing Force- Closure Grasps , 1988, Int. J. Robotics Res..

[27]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[31]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[33]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Sergey Levine,et al.  Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[35]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[36]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[37]  Rouhollah Rahmatizadeh,et al.  Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Mrinal Kalakrishnan,et al.  Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Peter Corke,et al.  Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.