Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection

We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images and independently of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. To train our network, we collected over 800,000 grasp attempts over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and hardware. Our experimental evaluation demonstrates that our method achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing.

[1]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[2]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[3]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[5]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Oussama Khatib,et al.  Springer Handbook of Robotics , 2007, Springer Handbooks.

[7]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Larry H. Matthies,et al.  End-to-end dexterous manipulation with deliberate interactive estimation , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Alexander Herzog,et al.  Learning of grasp selection based on shape-templates , 2014, Auton. Robots.

[10]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[11]  Rüdiger Dillmann,et al.  Visual servoing for humanoid grasping and manipulation tasks , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[12]  Joel W. Burdick,et al.  Combined shape, appearance and silhouette for simultaneous manipulator and object tracking , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13]  Danica Kragic,et al.  Survey on Visual Servoing for Manipulation , 2002 .

[14]  John Kenneth Salisbury,et al.  Using Near-Field Stereo Vision for Robotic Grasping in Cluttered Environments , 2010, ISER.

[15]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..