Volumetric and Multi-view CNNs for Object Classification on 3D Data

3D shape models are becoming widely available and easier to capture, making available 3D information crucial for progress in object classification. Current state-of-theart methods rely on CNNs to address this problem. Recently, we witness two types of CNNs being developed: CNNs based upon volumetric representations versus CNNs based upon multi-view representations. Empirical results from these two types of CNNs exhibit a large gap, indicating that existing volumetric CNN architectures and approaches are unable to fully exploit the power of 3D representations. In this paper, we aim to improve both volumetric CNNs and multi-view CNNs according to extensive analysis of existing approaches. To this end, we introduce two distinct network architectures of volumetric CNNs. In addition, we examine multi-view CNNs, where we introduce multiresolution filtering in 3D. Overall, we are able to outperform current state-of-the-art methods for both volumetric CNNs and multi-view CNNs. We provide extensive experiments designed to evaluate underlying design choices, thus providing a better understanding of the space of methods available for object classification on 3D data.

[1]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[2]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Berthold K. P. Horn Extended Gaussian images , 1984, Proceedings of the IEEE.

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[12]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[15]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[19]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[20]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[22]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[23]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[27]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[29]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[30]  Bernard Chazelle,et al.  Shape distributions , 2002, TOGS.

[31]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[33]  Iasonas Kokkinos,et al.  Intrinsic shape context descriptors for deformable shapes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[36]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[37]  Siddhartha Chaudhuri,et al.  Data-driven suggestions for creativity support in 3D modeling , 2010, ACM Trans. Graph..