Sliced voxel representations with LSTM and CNN for 3D shape recognition

We propose a sliced voxel representation, which we call Sliced Square Voxels (SSV), based on LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network) for three-dimensional shape recognition. Given an arbitrary 3D model, we first convert it into binary voxel of size 32×32×32. Then, after a view position is fixed, we slice the binary voxel data vertically in the depth direction. To utilize the 2D projected shape information of the sliced voxels, CNN has been applied. The output of CNN is fed into LSTM, which is our main idea, where the spatial topology is supposed to be favored with LSTM. From our experiments, our proposed method turns out to be superior to the baseline method which we prepared using 3DCNN. We further compared with related previous methods, using large-scale 3D model dataset (ModelNet10 and ModelNet40), and our proposed methods outperformed them.

[1]  Lei Wang,et al.  3D shape recognition and retrieval based on multi-modality deep learning , 2017, Neurocomputing.

[2]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Bernard Chazelle,et al.  Shape distributions , 2002, TOGS.

[4]  Xu Xu,et al.  Beam search for learning a deep Convolutional Neural Network of 3D shapes , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[5]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[6]  Fabio Roli,et al.  Neural shape codes for 3D model retrieval , 2015, Pattern Recognit. Lett..

[7]  Karthik Ramani,et al.  Deep Learning 3D Shape Surfaces Using Geometry Images , 2016, ECCV.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Thomas Brox,et al.  Orientation-boosted Voxel Nets for 3D Object Recognition , 2016, BMVC.

[11]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[12]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[13]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  A. Ben Hamza,et al.  Deep learning with geodesic moments for 3D shape classification , 2018, Pattern Recognit. Lett..

[15]  Leonidas J. Guibas,et al.  FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.

[16]  Fang Wang,et al.  Sketch-based 3D shape retrieval using Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ryutarou Ohbuchi,et al.  Accurate Aggregation of Local Features by using K-sparse Autoencoder for 3D Model Retrieval , 2016, ICMR.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[20]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[25]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).