Multi-view Convolutional Neural Networks for 3D Shape Recognition

A longstanding question in computer vision concerns the representation of 3D shapes for recognition: should 3D shapes be represented with descriptors operating on their native 3D formats, such as voxel grid or polygon mesh, or can they be effectively represented with view-based descriptors? We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images. We first present a standard CNN architecture trained to recognize the shapes' rendered views independently of each other, and show that a 3D shape can be recognized even from a single view at an accuracy far higher than using state-of-the-art 3D shape descriptors. Recognition rates further increase when multiple views of the shapes are provided. In addition, we present a novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering even better recognition performance. The same architecture can be applied to accurately recognize human hand-drawn sketches of shapes. We conclude that a collection of 2D views can be highly informative for 3D shape recognition and is amenable to emerging CNN architectures and their derivatives.

[1]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[2]  Berthold K. P. Horn Extended Gaussian images , 1984, Proceedings of the IEEE.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Ali Shokoufandeh,et al.  View-based 3-D object recognition using shock graphs , 2002, Object recognition supported by user interaction for service robots.

[5]  Bernard Chazelle,et al.  Shape distributions , 2002, TOGS.

[6]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[7]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[8]  J. Koenderink,et al.  The singularities of the visual mapping , 1976, Biological Cybernetics.

[9]  Benjamin B. Kimia,et al.  A Similarity-Based Aspect-Graph Approach to 3D Object Recognition , 2004, International Journal of Computer Vision.

[10]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[15]  Siddhartha Chaudhuri,et al.  Data-driven suggestions for creativity support in 3D modeling , 2010, ACM Trans. Graph..

[16]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[17]  Arjan Kuijper,et al.  Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours , 2010, ACM Multimedia.

[18]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[19]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[20]  Kun Zhou,et al.  Discriminative Sketch‐based 3D Model Retrieval via Robust Shape Matching , 2011, Comput. Graph. Forum.

[21]  Iasonas Kokkinos,et al.  Intrinsic shape context descriptors for deformable shapes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Marc Alexa,et al.  Sketch-based shape retrieval , 2012, ACM Trans. Graph..

[25]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[26]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[27]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[28]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[29]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Tinne Tuytelaars,et al.  Sketch classification and classification-driven analysis using Fisher vectors , 2014, ACM Trans. Graph..

[32]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[33]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[34]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[35]  A. Khosla,et al.  A Deep Representation for Volumetric Shape Modeling , 2015 .

[36]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Song Wu,et al.  3 D ShapeNets : A Deep Representation for Volumetric Shape Modeling , 2015 .

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.