Object Classification from 3D Volumetric Data with 3D Capsule Networks

The proliferation of 3D sensors induced 3D computer vision research for many application areas including virtual reality, autonomous navigation and surveillance. Recently, different methods have been proposed for 3D object classification. Many of the existing 2D and 3D classification methods rely on convolutional neural networks (CNNs), which are very successful in extracting features from the data. However, CNNs cannot sufficiently address the spatial relationship between features due to the max-pooling layers, and they require vast amount of training data. In this paper, we propose a model architecture for 3D object classification, which is an extension of Capsule Networks (CapsNets) to 3D data. Our proposed architecture called 3D CapsNet, takes advantage of the fact that a CapsNet preserves the orientation and spatial relationship of the extracted features, and thus requires less data to train the network. We compare our approach with ShapeNet on the ModelNet database, and show that our method provides performance improvement especially when training data size gets smaller.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[3]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xinbo Gao,et al.  Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images , 2017, Multimedia Tools and Applications.

[6]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xue Li,et al.  Modality-specific and hierarchical feature learning for RGB-D hand-held object recognition , 2016, Multimedia Tools and Applications.

[8]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xiang Li,et al.  LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition , 2017, 3DOR@Eurographics.

[11]  Leonidas J. Guibas,et al.  FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.

[12]  Luís A. Alexandre 3D Object Recognition Using Convolutional Neural Networks with Transfer Learning Between Input Channels , 2014, IAS.

[13]  Ioannis Pratikakis,et al.  Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.

[14]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[15]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).