Learning a visuomotor controller for real world robotic grasping using simulated depth images

We want to build robots that are useful in unstructured real world applications, such as doing work in the household. Grasping in particular is an important skill in this domain, yet it remains a challenge. One of the key hurdles is handling unexpected changes or motion in the objects being grasped and kinematic noise or other errors in the robot. This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object. We use a wrist-mounted sensor to acquire depth images in front of the gripper and train a convolutional neural network to learn a distance function to true grasps for grasp configurations over an image. The training sensor data is generated in simulation, a major advantage over previous work that uses real robot experience, which is costly to obtain. Despite being trained in simulation, our approach works well on real noisy sensor images. We compare our controller in simulated and real robot experiments to a strong baseline for grasp pose detection, and find that our approach significantly outperforms the baseline in the presence of kinematic noise, perceptual errors and disturbances of the object during grasping.

[1]  Van-Duc Nguyen,et al.  Constructing Force- Closure Grasps , 1988, Int. J. Robotics Res..

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[4]  Rüdiger Dillmann,et al.  Visual servoing for humanoid grasping and manipulation tasks , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[5]  James J. Kuffner,et al.  OpenRAVE: A Planning Architecture for Autonomous Robotics , 2008 .

[6]  Yael Edan,et al.  Computer vision for fruit harvesting robots - state of the art and challenges ahead , 2012, Int. J. Comput. Vis. Robotics.

[7]  Markus Vincze,et al.  3DNet: Large-scale object class recognition from CAD models , 2012, 2012 IEEE International Conference on Robotics and Automation.

[8]  Joel W. Burdick,et al.  Combined shape, appearance and silhouette for simultaneous manipulator and object tracking , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[11]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Robert Platt,et al.  Using Geometry to Detect Grasp Poses in 3D Point Clouds , 2015, ISRR.

[13]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Oussama Khatib,et al.  Springer Handbook of Robotics , 2007, Springer Handbooks.

[15]  Kate Saenko,et al.  High precision grasp pose detection in dense clutter , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Marek Sewer Kopicki,et al.  Active vision for dexterous grasping of novel objects , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[18]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[20]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..