Visual-Guided Robot Arm Using Multi-Task Faster R-CNN

The limitation of current visual recognition methods is a big obstacle for the application of automated robot arm systems into industrial projects, which require high precision and speed. In this work, we present a Faster RCNN based multi-task network, a deep neural network model, that is able to simultaneously perform three tasks including object detection, category classification and object angle estimation. Afterward, the outputs of all three tasks are utilized to decide a picking point and a rotated gripper angle for the pick-and-place robot arm system. The test results show that our network achieves a mean average precision of 86.6% at IoU (Intersection over Union) of 0.7, and a mean accuracy of 83.5% for the final prediction including object localization and angle estimation. In addition, the proposed multi-task network takes approximately 0.072 seconds to process an image, which is acceptable for pick-and-place robot arms.

[1]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[8]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[9]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[10]  Sukhan Lee,et al.  A robot-camera hand/eye self-calibration system using a planar target , 2013, IEEE ISR 2013.

[11]  Christopher Kanan,et al.  Robotic grasp detection using deep convolutional neural networks , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Patricio A. Vela,et al.  Real-World Multiobject, Multigrasp Detection , 2018, IEEE Robotics and Automation Letters.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[16]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).