3D convolutional neural networks by modal fusion

We propose multi-view and volumetric convolutional neural networks (ConvNets) for 3D shape recognition, which combines surface normal and height fields to capture local geometry and physical size of an object. This strategy helps distinguishing between objects with similar geometries but different sizes. This is especially useful for enhancing volumetric ConvNets and classifying 3D scans with insufficient surface details. Experimental results on CAD and real-world scan datasets showed that our technique outperforms previous approaches.

[1]  Bo Li,et al.  Large-Scale 3D Shape Retrieval from ShapeNet Core55 , 2016, 3DOR@Eurographics.

[2]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[3]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[4]  Marcin Novotni,et al.  3D zernike descriptors for content based shape retrieval , 2003, SM '03.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Trevor Darrell,et al.  Size Matters: Metric Visual Search Constraints from Monocular Metadata , 2010, NIPS.

[8]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[10]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yang Liu,et al.  O-CNN , 2017, ACM Trans. Graph..

[14]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[17]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[18]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[22]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[23]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Jiwen Lu,et al.  MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Jiwen Lu,et al.  Modality and Component Aware Feature Fusion for RGB-D Scene Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Bernard Chazelle,et al.  Shape distributions , 2002, TOGS.

[28]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Leonidas J. Guibas,et al.  FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.