论文信息 - Multi-view Convolutional Neural Networks for 3D Shape Recognition

Multi-view Convolutional Neural Networks for 3D Shape Recognition

A longstanding question in computer vision concerns the representation of 3D shapes for recognition: should 3D shapes be represented with descriptors operating on their native 3D formats, such as voxel grid or polygon mesh, or can they be effectively represented with view-based descriptors? We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images. We first present a standard CNN architecture trained to recognize the shapes' rendered views independently of each other, and show that a 3D shape can be recognized even from a single view at an accuracy far higher than using state-of-the-art 3D shape descriptors. Recognition rates further increase when multiple views of the shapes are provided. In addition, we present a novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering even better recognition performance. The same architecture can be applied to accurately recognize human hand-drawn sketches of shapes. We conclude that a collection of 2D views can be highly informative for 3D shape recognition and is amenable to emerging CNN architectures and their derivatives.

[1] Bui Tuong Phong. Illumination for computer generated pictures , 1975, Commun. ACM.

[2] Berthold K. P. Horn. Extended Gaussian images , 1984, Proceedings of the IEEE.

[3] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4] Ali Shokoufandeh,et al. View-based 3-D object recognition using shock graphs , 2002, Object recognition supported by user interaction for service robots.

[5] Bernard Chazelle,et al. Shape distributions , 2002, TOGS.

[6] Szymon Rusinkiewicz,et al. Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[7] Ming Ouhyoung,et al. On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[8] J. Koenderink,et al. The singularities of the visual mapping , 1976, Biological Cybernetics.

[9] Benjamin B. Kimia,et al. A Similarity-Based Aspect-Graph Approach to 3D Object Recognition , 2004, International Journal of Computer Vision.

[10] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Hiroshi Murase,et al. Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14] Luc Van Gool,et al. Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[15] Siddhartha Chaudhuri,et al. Data-driven suggestions for creativity support in 3D modeling , 2010, ACM Trans. Graph..

[16] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[17] Arjan Kuijper,et al. Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours , 2010, ACM Multimedia.

[18] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[19] Leonidas J. Guibas,et al. Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[20] Kun Zhou,et al. Discriminative Sketch‐based 3D Model Retrieval via Robust Shape Matching , 2011, Comput. Graph. Forum.

[21] Iasonas Kokkinos,et al. Intrinsic shape context descriptors for deformable shapes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Marc Alexa,et al. How do humans sketch objects? , 2012, ACM Trans. Graph..

[23] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24] Marc Alexa,et al. Sketch-based shape retrieval , 2012, ACM Trans. Graph..

[25] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[26] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[27] Andrew Zisserman,et al. Fisher Vector Faces in the Wild , 2013, BMVC.

[28] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[29] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Tinne Tuytelaars,et al. Sketch classification and classification-driven analysis using Fisher vectors , 2014, ACM Trans. Graph..

[32] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[33] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[34] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[35] A. Khosla,et al. A Deep Representation for Volumetric Shape Modeling , 2015 .

[36] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Song Wu,et al. 3 D ShapeNets : A Deep Representation for Volumetric Shape Modeling , 2015 .

[38] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.