Median-Shape Representation Learning for Category-Level Object Pose Estimation in Cluttered Environments

In this paper, we propose an occlusion-robust pose estimation method of an unknown object instance in an object category from a depth image. In a cluttered environment, objects are often occluded mutually. For estimating the pose of an object in such a situation, a method that de-occludes the unobservable area of the object would be effective. However, there are two difficulties; occlusion causes the offset between the center of the actual object and its observable area, and different instances in a category may have different shapes. To cope with these difficulties, we propose a two-stage Encoder-Decoder model to extract features with objects whose centers are aligned to the image center. In the model, we also propose the Median-shape Reconstructor as the second stage to absorb shape variations in a category. By evaluating the method with both a large-scale virtual dataset and a real dataset, we confirmed the proposed method achieves good performance on pose estimation of an occluded object from a depth image.

[1]  Mingyu Li,et al.  Accurate Object Pose Estimation Using Depth Only , 2018, Sensors.

[2]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[3]  Hiroshi Murase,et al.  Deep Manifold Embedding for 3D Object Pose Estimation , 2017, VISIGRAPP.

[4]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[6]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[7]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Pankaj Rabha,et al.  A Survey on Joint Object Detection and Pose Estimation using Monocular Vision , 2018, MATEC Web of Conferences.

[9]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[13]  Charles R. Dyer,et al.  Model-based recognition in robot vision , 1986, CSUR.

[14]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.