Point Cloud Deformation for Single Image 3d Reconstruction

We propose an approach to reconstruct a precise and dense 3d point cloud from a single image. Previous works employed reconstruction to complexity 3D shape or directly regression location from image. However, while the former requires overhead construction of 3D shape or is inefficient because of high computing cost, the latter does not scale well as the number of trainable parameters depends on the number of output points. In this paper, we explore a method to infer a point cloud representation given an input image. We extract shape information from an input image, and then we embed the two kinds of shape information into the point cloud: point-specific and global shape features. After that, we deform a randomly generated point cloud to the final representation based on the embedded point cloud feature. Our method does not require overhead construction, and is efficient and scalable because the number of trainable parameters is independent of the point cloud size, which is the first work to be able to do so according to our knowledge. Thorough experimental results suggest that our proposed method outperforms with other state-of-the-art methods in dense and precise point cloud generation.

[1]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[3]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[4]  Sanghoon Lee,et al.  Propagating LSTM: 3D Pose Estimation Based on Joint Interdependency , 2018, ECCV.

[5]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[6]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[7]  Xiaojuan Qi,et al.  GAL: Geometric Adversarial Loss for Single-View 3D-Object Reconstruction , 2018, ECCV.

[8]  Hyewon Song,et al.  ConcatNet: A Deep Architecture of Concatenation-Assisted Network for Dense Facial Landmark Alignment , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[9]  Alexey Dosovitskiy,et al.  Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[10]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Sanghoon Lee,et al.  Deep Blind Video Quality Assessment Based on Temporal Human Perception , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[12]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jongyoo Kim,et al.  Deep CNN-Based Blind Image Quality Predictor , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Alan Conrad Bovik,et al.  Deep Visual Discomfort Predictor for Stereoscopic 3D Images , 2018, IEEE Transactions on Image Processing.

[17]  Jinwoo Kim,et al.  Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network , 2018, ECCV.

[18]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[19]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).