Learning a Hierarchical Latent-Variable Model of Voxelized 3D Shapes

We propose the Variational Shape Learner (VSL), a hierarchical latent-variable model for 3D shape learning. VSL employs an unsupervised, variational approach to the inference and learning of the underlying structure of voxelized 3D shapes. Our model successfully learns 3D shapes via a hierarchical latent representation, made possible through the use of skip-connections. Realistic 3D objects can be generated by sampling its latent probabilistic manifold. We show that our inference and generative model can be trained end-to-end from 2D images to perform single image 3D model retrieval. Experiments show the improved performance of our model both quantitatively and qualitatively over a range of tasks.

[1]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[2]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[4]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[5]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[6]  Song Bai,et al.  Deep learning representation using autoencoder for 3D shape retrieval , 2014, SPAC.

[7]  Thomas Brox,et al.  Orientation-boosted Voxel Nets for 3D Object Recognition , 2016, BMVC.

[8]  Kostas Daniilidis,et al.  Object Detection from Large-Scale 3D Datasets Using Bottom-Up and Top-Down Descriptors , 2008, ECCV.

[9]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[10]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Joelle Pineau,et al.  Multi-modal Variational Encoder-Decoders , 2016, ArXiv.

[16]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[17]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[18]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[19]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[20]  Edward K. Wong,et al.  Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[22]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[24]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[27]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[28]  Zhe Gan,et al.  Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.

[29]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[30]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[31]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[34]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[37]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[38]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.