Toward 6 DOF Object Pose Estimation with Minimum Dataset

In this research, we propose a method for estimating 6 DOF object pose (3D orientation and position), based on convolutional neural networks (CNN). We propose RotationCNN that predicts 3D orientation of the object. The position of the object is estimated using an object detection CNN that predicts the class of the object and bounding box around it. Unlike the method that trains CNNs using a largescale database, the proposed system is trained with minimum dataset obtained in a local environment that is similar to where the robot is used. With the proposed semi-automated dataset collection techniques based on a web camera and AR markers, users in different environment will be able to train the network suited for their own environment relatively easily and quickly. We believe that this approach is suitable for a practical robotic application. The results on 3D orientation prediction using RotationCNN show the average error of 18.9 degrees, which we empirically found that it is low enough as an initial solution to successfully run the iterative closest point (ICP) algorithm that uses depth data to refine the pose obtained with CNNs. The effectiveness of the proposed method is validated by applying the method to object grasping by a robot manipulator.

[1]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[2]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Yoshio Matsumoto,et al.  A portable 6-DOF motion tracker using high-accuracy AR markers — First report on the feasibility , 2015, 2015 14th IAPR International Conference on Machine Vision Applications (MVA).

[4]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[7]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Kuniyuki Takahashi,et al.  Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Francisco José Madrid-Cuevas,et al.  Automatic generation and detection of highly reliable fiducial markers under occlusion , 2014, Pattern Recognit..

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Peter K. Allen,et al.  Generating multi-fingered robotic grasps via deep learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Xiaowei Zhou,et al.  6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[15]  Jiaolong Yang,et al.  Go-ICP: Solving 3D Registration Efficiently and Globally Optimally , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).