DV-Net: Dual-view network for 3D reconstruction by fusing multiple sets of gated control point clouds

Abstract Deep learning for 3D reconstruction have just shown some promising advantanges, where 3D shapes can be predicted from a single RGB image. However, such works are often limited by single feature cue, which does not capture the 3D shape of objects well. To address this problem, an end-to-end 3D reconstruction approach that predicts 3D point cloud from dual-view RGB images is proposed in this paper. It consists of several processing parts. A dual-view 3D reconstruction network is proposed for 3D reconstruction, which predicts object’s point clouds by exploiting two RGB images with different views, and avoids the limitation of single feature cue. Another structure feature learning network is performed to extract the structure features with stronger representation ability from point clouds. A gated control network for data fusion is proposed to gather point clouds. It takes two sets of point clouds with different views as input and fuses them. The proposed approach is thoroughly evaluated with extensive experiments on the widely-used ShapeNet dataset. Both the qualitative results and quantitative analysis demonstrate that this method not only captures the detailed geometric structures of 3D shapes for different object categories with complex topologies, but also achieves state-of-the-art performance.

[1]  Chen Kong,et al.  Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[2]  Yehoshua Y. Zeevi,et al.  The farthest point strategy for progressive image sampling , 1997, IEEE Trans. Image Process..

[3]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[5]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[6]  Neil A. Dodgson,et al.  Fast Marching farthest point sampling , 2003, Eurographics.

[7]  Gustavo Carneiro,et al.  Scaling CNNs for High Resolution Volumetric Reconstruction from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[8]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[9]  Jitendra Malik,et al.  Learning Category-Specific Deformable 3D Models for Object Reconstruction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[13]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[14]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[15]  R. Venkatesh Babu,et al.  3D-PSRNet: Part Segmented 3D Point Cloud Reconstruction From a Single Image , 2018, ECCV Workshops.

[16]  Yang Liu,et al.  O-CNN , 2017, ACM Trans. Graph..

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  R. Venkatesh Babu,et al.  3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image , 2018, BMVC.

[19]  Wei Liang,et al.  Deep Single-View 3D Object Reconstruction with Visual Hull Embedding , 2019, AAAI.

[20]  Jiajun Wu,et al.  Learning Shape Priors for Single-View 3D Completion and Reconstruction , 2018, ECCV.

[21]  Xiaoguang Han,et al.  A Skeleton-Bridged Deep Learning Approach for Generating Meshes of Complex Topologies From Single RGB Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  R. Venkatesh Babu,et al.  CAPNet: Continuous Approximation Projection For 3D Point Cloud Reconstruction Using 2D Supervision , 2018, AAAI.

[23]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[24]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Rozenn Dahyot,et al.  Deep Shape from a Low Number of Silhouettes , 2016, ECCV Workshops.

[27]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[28]  Chad DeChant,et al.  Shape completion enabled robotic grasping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Yinda Zhang,et al.  Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).