论文信息 - Learning a Hierarchical Latent-Variable Model of 3D Shapes

Learning a Hierarchical Latent-Variable Model of 3D Shapes

We propose the Variational Shape Learner (VSL), a generative model that learns the underlying structure of voxelized 3D shapes in an unsupervised fashion. Through the use of skip-connections, our model can successfully learn and infer a latent, hierarchical representation of objects. Furthermore, realistic 3D objects can be easily generated by sampling the VSL's latent probabilistic manifold. We show that our generative model can be trained end-to-end from 2D images to perform single image 3D model retrieval. Experiments show, both quantitatively and qualitatively, the improved generalization of our proposed model over a range of tasks, performing better or comparable to various state-of-the-art alternatives.

C. Lee Giles | Alexander Ororbia | Shikun Liu | Alexander Ororbia | Shikun Liu

[1] Max Jaderberg,et al. Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[2] Jean-Philippe Pons,et al. Minimizing the Multi-view Stereo Reprojection Error for Triangular Surface Meshes , 2008, BMVC.

[3] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6] Zhichao Zhou,et al. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[7] Amos J. Storkey,et al. Towards a Neural Statistician , 2016, ICLR.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Abhinav Gupta,et al. Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[10] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[11] Szymon Rusinkiewicz,et al. Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[12] Andrew Zisserman,et al. SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes , 2017, BMVC.

[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14] Ole Winther,et al. Ladder Variational Autoencoders , 2016, NIPS.

[15] Jitendra Malik,et al. Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[17] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[18] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[19] Zhe Gan,et al. Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.

[20] Jitendra Malik,et al. Learning a Multi-View Stereo Machine , 2017, NIPS.

[21] Nico Blodow,et al. Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[22] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[23] Silvio Savarese,et al. Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[24] Karthik Ramani,et al. Deep Learning 3D Shape Surfaces Using Geometry Images , 2016, ECCV.

[25] Joelle Pineau,et al. Piecewise Latent Variables for Neural Variational Text Processing , 2016, EMNLP.

[26] Kunihiko Fukushima,et al. Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[27] Yoshua Bengio,et al. Deep Learning of Representations: Looking Forward , 2013, SLSP.

[28] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[29] Edward K. Wong,et al. Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Oliver Grau,et al. VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[31] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[33] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[34] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[35] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[36] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[37] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.

[38] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Kostas Daniilidis,et al. Object Detection from Large-Scale 3D Datasets Using Bottom-Up and Top-Down Descriptors , 2008, ECCV.

[40] Song Bai,et al. Deep learning representation using autoencoder for 3D shape retrieval , 2014, SPAC.

[41] Thomas Brox,et al. Orientation-boosted Voxel Nets for 3D Object Recognition , 2016, BMVC.

[42] Ming Ouhyoung,et al. On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[43] Pau Gargallo,et al. Minimizing the Reprojection Error in Surface Reconstruction from Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44] Sebastian Scherer,et al. VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45] Subhransu Maji,et al. Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46] Luc Van Gool,et al. Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[47] Léon Bottou,et al. Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[48] Jitendra Malik,et al. Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[49] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.