Transcoding across 3D shape representations for unsupervised learning of 3D shape feature

Abstract Unsupervised learning of 3D shape feature is a challenging yet important problem for organizing a large collection of 3D shape models that do not have annotations. Recently proposed neural network-based approaches attempt to learn meaningful 3D shape feature by autoencoding a single 3D shape representation such as voxel, 3D point set, or multiview 2D images. However, using single shape representation isn't sufficient in training an effective 3D shape feature extractor, as none of existing shape representation can fully describe geometry of 3D shapes by itself. In this paper, we propose to use transcoding across multiple 3D shape representations as the unsupervised method to obtain expressive 3D shape feature. A neural network called Shape Auto-Transcoder (SAT) learns to extract 3D shape features via cross-prediction of multiple heterogeneous 3D shape representations. Architecture and training objective of SAT are carefully designed to obtain effective feature embedding. Experimental evaluation using 3D model retrieval and 3D model classification scenarios demonstrates high accuracy as well as compactness of the proposed 3D shape feature. The code of SAT is available at https://github.com/takahikof/ShapeAutoTranscoder .

[1]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Marcin Novotni,et al.  3D zernike descriptors for content based shape retrieval , 2003, SM '03.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Duc Thanh Nguyen,et al.  Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Song Bai,et al.  Deep learning representation using autoencoder for 3D shape retrieval , 2014, Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

[6]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pierre Vandergheynst,et al.  Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[8]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[12]  Yong Dou,et al.  An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning , 2016, Neurocomputing.

[13]  Raif M. Rustamov,et al.  Laplace-Beltrami eigenfunctions for deformation invariant shape representation , 2007 .

[14]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Ryutarou Ohbuchi,et al.  Salient local visual features for shape-based 3D model retrieval , 2008, 2008 IEEE International Conference on Shape Modeling and Applications.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yasuyuki Matsushita,et al.  Multi-task Learning Using Multi-modal Encoder-Decoder Networks with Shared Skip Connections , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[18]  Gabriel Taubin,et al.  The ball-pivoting algorithm for surface reconstruction , 1999, IEEE Transactions on Visualization and Computer Graphics.

[19]  Marcel Körtgen,et al.  3D Shape Matching with 3D Shape Contexts , 2003 .

[20]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[21]  Iasonas Kokkinos,et al.  Scale-invariant heat kernel signatures for non-rigid shape recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[24]  Eric Wahl,et al.  Surflet-pair-relation histograms: a statistical 3D-shape representation for rapid classification , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[25]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[29]  Guillaume Gravier,et al.  Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications , 2016, ICMR.

[30]  Paolo Cignoni,et al.  MeshLab: an Open-Source Mesh Processing Tool , 2008, Eurographics Italian Chapter Conference.

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[33]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[34]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[37]  Gregory Shakhnarovich,et al.  Learning Representations for Automatic Colorization , 2016, ECCV.

[38]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[39]  Zhang Xiong,et al.  3D object retrieval with stacked local convolutional autoencoder , 2015, Signal Process..

[40]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[41]  Matthias Zwicker,et al.  View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions , 2018, AAAI.

[42]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).