Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network

In recent years, deep learning-based object recognition algorithms become emerging in robotic vision applications. This paper addresses the design of a novel deep learning-based visual object recognition and pose estimation system for a robot manipulator to handle random object picking tasks. The proposed visual control system consists of a visual perception module, an object pose estimation module, a data argumentation module, and a robot manipulator controller. The visual perception module combines deep convolution neural networks (CNNs) and a fully connected conditional random field layer to realize an image semantic segmentation function, which can provide stable and accurate object classification results in cluttered environments. The object pose estimation module implements a model-based pose estimation method to estimate the 3D pose of the target for picking control. In addition, the proposed data argumentation module automatically generates training data for training the deep CNN. Experimental results show that the proposed scene segmentation method used in the data argumentation module reaches a high accuracy rate of 97.10% on average, which is higher than other state-of-the-art segment methods. Moreover, with the proposed data argumentation module, the visual perception module reaches an accuracy rate over than 80% and 72% in the case of detecting and recognizing one object and three objects, respectively. In addition, the proposed model-based pose estimation method provides accurate 3D pose estimation results. The average translation and rotation errors in the three axes are all smaller than 0.52 cm and 3.95 degrees, respectively. These advantages make the proposed visual control system suitable for applications of random object picking and manipulation.

[1]  Bolei Zhou,et al.  SegICP: Integrated deep semantic segmentation and pose estimation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Oliver Brock,et al.  Probabilistic multi-class segmentation for the Amazon Picking Challenge , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Lena Gorelick,et al.  GrabCut in One Cut , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Nico Blodow,et al.  Aligning point cloud views using persistent feature histograms , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Kai-Tai Song,et al.  CAD-based pose estimation for random bin-picking of multiple objects using a RGB-D camera , 2015, 2015 15th International Conference on Control, Automation and Systems (ICCAS).

[6]  Chi-Yi Tsai,et al.  Simultaneous 3D Object Recognition and Pose Estimation Based on RGB-D Images , 2018, IEEE Access.

[7]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[8]  Henrik Gordon Petersen,et al.  Pose estimation using local structure-specific shape and appearance context , 2013, 2013 IEEE International Conference on Robotics and Automation.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Kuan-Ting Yu,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2019, The International Journal of Robotics Research.

[11]  Ismail Ben Ayed,et al.  Secrets of GrabCut and Kernel K-Means , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Antonio Torralba,et al.  LabelMe: Online Image Annotation and Applications , 2010, Proceedings of the IEEE.

[14]  Kostas Daniilidis,et al.  Single image 3D object detection and pose estimation for grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Chi-Yi Tsai,et al.  Real-time Textureless Object Detection and Recognition Based on an Edge-based Hierarchical Template Matching Algorithm , 2018 .

[16]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Masaki Saito,et al.  End-to-End Learning of Object Grasp Poses in the Amazon Robotics Challenge , 2020 .

[18]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Ferran Marqués,et al.  GAT: a Graphical Annotation Tool for semantic regions , 2009, Multimedia Tools and Applications.

[20]  C. Saathoff,et al.  KAT: The K-Space Annotation Tool , 2008 .

[21]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[22]  Martijn Wisse,et al.  Team Delft's Robot Winner of the Amazon Picking Challenge 2016 , 2016, RoboCup.

[23]  Ming-Yu Liu,et al.  Voting-based pose estimation for robotic assembly using a 3D sensor , 2012, 2012 IEEE International Conference on Robotics and Automation.

[24]  Niloy J. Mitra,et al.  Super4PCS: Fast Global Pointcloud Registration via Smart Indexing , 2019 .

[25]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).