Adversarial autoencoders for compact representations of 3D point clouds

Deep generative architectures provide a way to model not only images but also complex, 3-dimensional objects, such as point clouds. In this work, we present a novel method to obtain meaningful representations of 3D shapes that can be used for challenging tasks including 3D points generation, reconstruction, compression, and clustering. Contrary to existing methods for 3D point cloud generation that train separate decoupled models for representation learning and generation, our approach is the first end-to-end solution that allows to simultaneously learn a latent space of representation and generate 3D shape out of it. Moreover, our model is capable of learning meaningful compact binary descriptors with adversarial training conducted on a latent space. To achieve this goal, we extend a deep Adversarial Autoencoder model (AAE) to accept 3D input and create 3D output. Thanks to our end-to-end training regime, the resulting method called 3D Adversarial Autoencoder (3dAAE) obtains either binary or continuous latent space that covers a much wider portion of training data distribution. Finally, our quantitative evaluation shows that 3dAAE provides state-of-the-art results for 3D points clustering and 3D object retrieval.

[1]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[2]  Yaser Sheikh,et al.  Separable Spatiotemporal Priors for Convex Reconstruction of Time-Varying 3D Point Clouds , 2014, ECCV.

[3]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[9]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[10]  Daniel Cohen-Or,et al.  PU-Net: Point Cloud Upsampling Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Neil A. Dodgson,et al.  Shape2Vec: semantic-based descriptors for 3D shapes, sketches and images , 2016, ACM Trans. Graph..

[13]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[14]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[15]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[16]  Luca Ambrogioni,et al.  Wasserstein Variational Inference , 2018, NeurIPS.

[17]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[18]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[19]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[23]  Tomasz Trzcinski,et al.  BinGAN: Learning Compact Binary Descriptors with a Regularized GAN , 2018, NeurIPS.

[24]  Weisi Lin,et al.  B-SHOT: A binary feature descriptor for fast and efficient keypoint matching on 3D point clouds , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Lei Wang,et al.  Training Triplet Networks with GAN , 2017, ICLR.

[27]  Daniel Cohen-Or,et al.  PointWise: An Unsupervised Point-wise Feature Learning Network , 2019, ArXiv.

[28]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29]  Gim Hee Lee,et al.  PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Anath Fischer,et al.  Graph Based Over-Segmentation Methods for 3D Point Clouds , 2017, Comput. Vis. Image Underst..

[31]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[32]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Cláudio T. Silva,et al.  Robust Smooth Feature Extraction from Point Clouds , 2007, IEEE International Conference on Shape Modeling and Applications 2007 (SMI '07).

[35]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[36]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[37]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[38]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Stefan Gumhold,et al.  Feature Extraction From Point Clouds , 2001, IMR.

[40]  Federico Tombari,et al.  SHOT: Unique signatures of histograms for surface and texture description , 2014, Comput. Vis. Image Underst..

[41]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[42]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[43]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[44]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[45]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Wolfram Burgard,et al.  Traversability analysis for mobile robots in outdoor environments: A semi-supervised learning approach based on 3D-lidar data , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Markus H. Gross,et al.  Multi‐scale Feature Extraction on Point‐Sampled Surfaces , 2003, Comput. Graph. Forum.

[48]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.