Multi-View Saliency Guided Deep Neural Network for 3-D Object Retrieval and Classification

In this paper, we propose the multi-view saliency guided deep neural network (MVSG-DNN) for 3D object retrieval and classification. This method mainly consists of three key modules. First, the module of model projection rendering is employed to capture the multiple views of one 3D object. Second, the module of visual context learning applies the basic Convolutional Neural Networks for visual feature extraction of individual views and then employs the saliency LSTM to adaptively select the representative views based on multi-view context. Finally, with these information, the module of multi-view representation learning can generate the compile 3D object descriptors with the designed classification LSTM for 3D object retrieval and classification. The proposed MVSG-DNN has two main contributions: 1) It can jointly realize the selection of representative views and the similarity measure by fully exploiting multi-view context; 2) It can discover the discriminative structure of multi-view sequence without constraints of specific camera settings. Consequently, it can support flexible 3D object retrieval and classification for real applications by avoiding the required camera settings. Extensive comparison experiments on ModelNet10, ModelNet40, and ShapeNetCore55 demonstrate the superiority of MVSG-DNN against the state-of-art methods.

[1]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[2]  Karthik Ramani,et al.  Deep Learning 3D Shape Surfaces Using Geometry Images , 2016, ECCV.

[3]  Ioannis Pratikakis,et al.  Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.

[4]  Whoi-Yul Kim,et al.  A region-based shape descriptor using Zernike moments , 2000, Signal Process. Image Commun..

[5]  Ryutarou Ohbuchi,et al.  Salient local visual features for shape-based 3D model retrieval , 2008, 2008 IEEE International Conference on Shape Modeling and Applications.

[6]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  José García Rodríguez,et al.  PointNet: A 3D Convolutional Neural Network for real-time object class recognition , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[8]  Yi Fang,et al.  3D-A-Nets: 3D Deep Dense Descriptor for Volumetric Shapes with Adversarial Networks , 2017, ArXiv.

[9]  Chang-Hsing Lee,et al.  A new 3D model retrieval approach based on the elevation descriptor , 2007, Pattern Recognit..

[10]  Hamid Laga,et al.  Covariance-Based Descriptors for Efficient 3D Shape Matching, Retrieval, and Classification , 2015, IEEE Transactions on Multimedia.

[11]  Xindong Wu,et al.  3-D Object Retrieval With Hausdorff Distance Learning , 2014, IEEE Transactions on Industrial Electronics.

[12]  Ryutarou Ohbuchi,et al.  Lightweight Binary Voxel Shape Features for 3D Data Matching and Retrieval , 2015, 2015 IEEE International Conference on Multimedia Big Data.

[13]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yuting Su,et al.  Graph-based characteristic view set extraction and matching for 3D model retrieval , 2015, Inf. Sci..

[15]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[17]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[19]  Yue Gao,et al.  3-D Object Retrieval and Recognition With Hypergraph Analysis , 2012, IEEE Transactions on Image Processing.

[20]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiangyu Wang,et al.  3D Model Retrieval with Weighted Locality-constrained Group Sparse Coding , 2015, Neurocomputing.

[22]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[24]  Taku Komura,et al.  Topology matching for fully automatic similarity estimation of 3D shapes , 2001, SIGGRAPH.

[25]  Bo Li,et al.  Large-Scale 3D Shape Retrieval from ShapeNet Core55 , 2016, 3DOR@Eurographics.

[26]  Mohamed Daoudi,et al.  A Bayesian 3-D Search Engine Using Adaptive Views Clustering , 2007, IEEE Transactions on Multimedia.

[27]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[28]  Yu-Ting Su,et al.  View-Based 3-D Model Retrieval: A Benchmark , 2018, IEEE Transactions on Cybernetics.

[29]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[30]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Meng Wang,et al.  Multi-View Object Retrieval via Multi-Scale Topic Models , 2016, IEEE Transactions on Image Processing.

[32]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[33]  Tinne Tuytelaars,et al.  Sketch classification and classification-driven analysis using Fisher vectors , 2014, ACM Trans. Graph..

[34]  Yue Gao,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016, IEEE Transactions on Image Processing.

[35]  Ludovico Minto,et al.  Deep learning for 3D shape classification from multiple depth maps , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[36]  Qi Tian,et al.  Multiview Hessian Semisupervised Sparse Feature Selection for Multimedia Analysis , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Dong Wang,et al.  Learning Descriptors With Cube Loss for View-Based 3-D Object Retrieval , 2019, IEEE Transactions on Multimedia.

[40]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[41]  Iasonas Kokkinos,et al.  Intrinsic shape context descriptors for deformable shapes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  C. Lee Giles,et al.  Learning a Hierarchical Latent-Variable Model of 3D Shapes , 2017, 2018 International Conference on 3D Vision (3DV).

[43]  Minh N. Do,et al.  2D Image-Based 3D Scene Retrieval , 2018, 3DOR@Eurographics.

[44]  Yue Gao,et al.  3D model retrieval using weighted bipartite graph matching , 2011, Signal Process. Image Commun..

[45]  Minh N. Do,et al.  RGB-D Object-to-CAD Retrieval , 2018, 3DOR@Eurographics.

[46]  Shagan Sah,et al.  General-Purpose Deep Point Cloud Feature Extractor , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[47]  Wei An,et al.  BV-CNNs: Binary Volumetric Convolutional Networks for 3D Object Recognition , 2017, BMVC.