Depth Image–Based Deep Learning of Grasp Planning for Textureless Planar-Faced Objects in Vision-Guided Robotic Bin-Picking

Bin-picking of small parcels and other textureless planar-faced objects is a common task at warehouses. A general color image–based vision-guided robot picking system requires feature extraction and goal image preparation of various objects. However, feature extraction for goal image matching is difficult for textureless objects. Further, prior preparation of huge numbers of goal images is impractical at a warehouse. In this paper, we propose a novel depth image–based vision-guided robot bin-picking system for textureless planar-faced objects. Our method uses a deep convolutional neural network (DCNN) model that is trained on 15,000 annotated depth images synthetically generated in a physics simulator to directly predict grasp points without object segmentation. Unlike previous studies that predicted grasp points for a robot suction hand with only one vacuum cup, our DCNN also predicts optimal grasp patterns for a hand with two vacuum cups (left cup on, right cup on, or both cups on). Further, we propose a surface feature descriptor to extract surface features (center position and normal) and refine the predicted grasp point position, removing the need for texture features for vision-guided robot control and sim-to-real modification for DCNN model training. Experimental results demonstrate the efficiency of our system, namely that a robot with 7 degrees of freedom can pick randomly posed textureless boxes in a cluttered environment with a 97.5% success rate at speeds exceeding 1000 pieces per hour.

[1]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[2]  François Chaumette,et al.  2 1/2 D Visual Servoing with Respect to Unknown Objects Through a New Estimation Scheme of Camera Displacement , 2000, International Journal of Computer Vision.

[3]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Xinyu Liu,et al.  Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning , 2017, ArXiv.

[6]  François Chaumette,et al.  Theoretical improvements in the stability analysis of a new class of model-free visual servoing methods , 2002, IEEE Trans. Robotics Autom..

[7]  Michael Milford,et al.  Adversarial discriminative sim-to-real transfer of visuo-motor policies , 2017, Int. J. Robotics Res..

[8]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[9]  E. Malis,et al.  2 1/2 D Visual Servoing , 1999 .

[10]  Ken Goldberg,et al.  Learning ambidextrous robot grasping policies , 2019, Science Robotics.

[11]  Hui Cheng,et al.  PPR-Net:Point-wise Pose Regression Network for Instance Segmentation and 6D Pose Estimation in Bin-picking Scenarios , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Farrokh Janabi-Sharifi,et al.  Visual Servoing: Theory and Applications , 2002 .

[13]  Tomomi Yamaguchi Japan’s robotic future , 2019 .

[14]  Douglas Chai,et al.  Review of Deep Learning Methods in Robotic Grasp Detection , 2018, Multimodal Technol. Interact..

[15]  Wolfgang Förstner,et al.  Plane Detection in Point Cloud Data , 2010 .

[16]  Stefan Leutenegger,et al.  Deep learning a grasp function for grasping under gripper pose uncertainty , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[21]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Ping Zhang,et al.  Human–Manipulator Interface Based on Multisensory Process via Kalman Filters , 2014, IEEE Transactions on Industrial Electronics.

[23]  William J. Wilson,et al.  Relative end-effector control using Cartesian position based visual servoing , 1996, IEEE Trans. Robotics Autom..

[24]  E. Lander,et al.  Complete multipoint sib-pair analysis of qualitative and quantitative traits. , 1995, American journal of human genetics.

[25]  Christopher Kanan,et al.  Robotic grasp detection using deep convolutional neural networks , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[27]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Xin Liu,et al.  Markerless Human–Manipulator Interface Using Leap Motion With Interval Kalman Filter and Improved Particle Filter , 2016, IEEE Transactions on Industrial Informatics.

[29]  Chenguang Yang,et al.  Extended State Observer-Based Integral Sliding Mode Control for an Underwater Robot With Unknown Disturbances and Uncertain Nonlinearities , 2017, IEEE Transactions on Industrial Electronics.

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Kenneth Y. Goldberg,et al.  Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences , 2017, CoRL.

[32]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[33]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[34]  Lihi Zelnik-Manor,et al.  Template Matching with Deformable Diversity Similarity , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Philippe Martinet,et al.  Position based visual servoing: keeping the object in the field of vision , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[37]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[39]  Brahim Chaib-draa,et al.  GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[41]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[43]  Fadi Dornaika,et al.  Visually guided object grasping , 1998, IEEE Trans. Robotics Autom..

[44]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[45]  Anis Sahbani,et al.  An overview of 3D object grasp synthesis algorithms , 2012, Robotics Auton. Syst..

[46]  Guanglong Du,et al.  A Markerless Human–Robot Interface Using Particle Filter and Kalman Filter for Dual Robots , 2015, IEEE Transactions on Industrial Electronics.

[47]  Peter Corke,et al.  Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[48]  Chyi-Yeu Lin,et al.  6D pose estimation using an improved method based on point pair features , 2018, 2018 4th International Conference on Control, Automation and Robotics (ICCAR).

[49]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Tae-Kyun Kim,et al.  Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios , 2018, BMVC.

[52]  Chyi-Yeu Lin,et al.  Bin-Picking for Planar Objects Based on a Deep Learning Network: A Case Study of USB Packs , 2019, Sensors.

[53]  Tetsuya Takiguchi,et al.  Object recognition and segmentation using SIFT and Graph Cuts , 2008, 2008 19th International Conference on Pattern Recognition.

[54]  Zhijun Zhang,et al.  Comparisons of planar detection for service robot with RANSAC and region growing algorithm , 2017, 2017 36th Chinese Control Conference (CCC).

[55]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.