论文信息 - Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images

Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[3] Jitendra Malik,et al. Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Yang Zhang,et al. RealPoint3D: An Efficient Generation Network for 3D Object Reconstruction From a Single Image , 2019, IEEE Access.

[5] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[6] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[7] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Jiajun Wu,et al. MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[9] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[10] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[12] Alexei A. Efros,et al. Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[15] A. U.S.,et al. Recovering Surface Shape and Orientation from Texture , 2002 .

[16] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[17] R. Venkatesh Babu,et al. 3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image , 2018, BMVC.

[18] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[19] Leonidas J. Guibas,et al. Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] Thomas Brox,et al. Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22] José Ruíz Ascencio,et al. Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[23] Markus H. Gross,et al. Human Shape from Silhouettes Using Generative HKS Descriptors and Cross-Modal Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Ronen Basri,et al. A Survey on Structure from Motion , 2017, ArXiv.

[25] Samy Bengio,et al. Order Matters: Sequence to sequence for sets , 2015, ICLR.

[26] Jitendra Malik,et al. Learning a Multi-View Stereo Machine , 2017, NIPS.

[27] Silvio Savarese,et al. Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[28] Richard Szeliski,et al. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29] Jiajun Wu,et al. Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Shubham Tulsiani,et al. Learning Single-view 3D Reconstruction of Objects and Scenes , 2018 .

[31] Yang Liu,et al. O-CNN , 2017, ACM Trans. Graph..

[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33] Wonyong Sung,et al. Single stream parallelization of generalized LSTM-like RNNs on a GPU , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34] Meng Wang,et al. 3DensiNet: A Robust Neural Network Architecture towards 3D Volumetric Object Prediction from 2D Image , 2017, ACM Multimedia.

[35] Stefan Roth,et al. Discriminative shape from shading in uncalibrated illumination , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Bo Yang,et al. Dense 3D Object Reconstruction from a Single Depth View , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Pietro Perona,et al. 3D Reconstruction by Shadow Carving: Theory and Practical Evaluation , 2007, International Journal of Computer Vision.

[38] Ozan ARSLAN. 3d Object Reconstruction from a Single Image. , 2014 .